Most Cited CVPR "functional neuroimaging" Papers
5,589 papers found • Page 9 of 28
Conference
Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen et al.
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
Sean I. Young, Yaël Balbastre, Bruce Fischl et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han, Junfeng Lyu, Feng Xu
AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye, Weiyang Liu, Pan He
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet, Maks Ovsjanikov
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
Junjie Chen, Jiebin Yan, Yuming Fang et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi, Keita Takahashi, Chihiro Tsutake et al.
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu et al.
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan, Yan Zhang, Shuqiang Cai et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani
Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
Jia Li, Ziling Chen, Xiaolong Wu et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko
Motion Diversification Networks
Hee Jae Kim, Eshed Ohn-Bar
TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease
Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
Boosting Flow-based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang, Yang Liu
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Fangyin Wei, Hanlin Chen, Gim Hee Lee
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang, Quanmin Liang, Yijie Yu et al.
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
SapiensID: Foundation for Human Recognition
Minchul Kim, Dingqiang Ye, Yiyang Su et al.
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.
DiffFNO: Diffusion Fourier Neural Operator
Xiaoyi Liu, Hao Tang
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement
Hesong Li, Ziqi Wu, Ruiwen Shao et al.
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai et al.
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim, Dongha Lee, SeongKu Kang et al.
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang et al.
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong, Angela Yao
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu, Rui Li, Yu Zhu et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
Generative Zero-Shot Composed Image Retrieval
Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu, Yuxin Peng
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Sen Wang, Le Wang, Sanping Zhou et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation
Dong Zhao, Shuang Wang, Qi Zang et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Ziheng Zhang, Jianyang Gu, Arpita Chowdhury et al.
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry, Jacob Krantz, Stefan Lee
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le, Chenhui Gou, Stavya Datta et al.
Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
Yuanzhen Li, Fei LUO, Chunxia Xiao
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan et al.
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu, Miaomiao Cai, Yinda Chen et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Jamie Watson, Filippo Aleotti, Mohamed Sayed et al.
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi, Kun JIANG, Ke Wang et al.
Event-based Structure-from-Orbit
Ethan Elms, Yasir Latif, Tae Ha Park et al.
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi, Manish Singh, Hong Cai et al.
Clustering for Protein Representation Learning
Ruijie Quan, Wenguan Wang, Fan Ma et al.
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Linfang Zheng, Tze Ho Elden Tse, Chen Wang et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Ling Gao, Daniel Gehrig, Hang Su et al.
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng, Jiehua Zhang, Xichun Sheng et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen et al.
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
Hao Xu, Xiaolin Wu, Xi Zhang
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu, Hau San Wong, Si Wu et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun, Jae-Han Lee, Chang-Su Kim
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai, HangChen, Jun Du et al.
SfM-Free 3D Gaussian Splatting via Hierarchical Training
Bo Ji, Angela Yao
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang, Pengpeng Zeng et al.
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang, Jian Yin, Haoran Xie et al.
Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hongsheng Li et al.
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
Xuecheng Wu, Heli Sun, Yifan Wang et al.
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim, Tae-Kyun Kim
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin, Kongming Liang, Bing Li et al.
L-MAGIC: Language Model Assisted Generation of Images with Coherence
zhipeng cai, Matthias Mueller, Reiner Birkl et al.
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato, Justin Yue, Nanze Chen et al.
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu, Ke Zhu
The Computer Vision Foundation
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
Hyperbolic Category Discovery
Yuanpei Liu, Zhenqi He, Kai Han
Dynamic Updates for Language Adaptation in Visual-Language Tracking
Xiaohai Li, Bineng Zhong, Qihua Liang et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
Panorama Generation From NFoV Image Done Right
Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu, Yuxin Zheng, Zelong Li et al.
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash et al.
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee, Gim Hee Lee
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang, Yixin Chen, Zan Wang et al.
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou