Most Cited CVPR "object categories" Papers
5,589 papers found • Page 27 of 28
Conference
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.
Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian, Kuan-Chieh Wang, Or Patashnik et al.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou, Hui Ren, Yijia Weng et al.
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
Tianyi Zhu, Dongwei Ren, Qilong Wang et al.
Exploring Temporally-Aware Features for Point Tracking
Inès Hyeonsu Kim, Seokju Cho, Gabriel Huang et al.
Style-Editor: Text-driven Object-centric Style Editing
Jihun Park, Jongmin Gim, Kyoungmin Lee et al.
Locally Orderless Images for Optimization in Differentiable Rendering
Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi
Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention
Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci
A Dataset for Semantic Segmentation in the Presence of Unknowns
Zakaria Laskar, Tomas Vojir, Matej Grcic et al.
Light Transport-aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes
Ludwic Leonard, Nils Thuerey, rüdiger westermann
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.
Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
Hao Tan, Zichang Tan, Jun Li et al.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.
Adaptive Parameter Selection for Tuning Vision-Language Models
Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo et al.
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan, Zeshi Yang, Zhiyang Dou et al.
ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning
Haoyuan Yang, Xiaoou Li, Jiaming Lv et al.
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang, Xuan Dong, Xizhou Zhu et al.
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.
Free Lunch Enhancements for Multi-modal Crowd Counting
Haoliang Meng, Xiaopeng Hong, Zhengqin Lai et al.
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
German Barquero, Nadine Bertsch, Manojkumar Marramreddy et al.
Efficient Personalization of Quantized Diffusion Model without Backpropagation
Hoigi Seo, Wongi Jeong, Kyungryeol Lee et al.
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu, Kun Yuan, Qizhi Xie et al.
Extreme Rotation Estimation in the Wild
Hana Bezalel, Dotan Ankri, Ruojin Cai et al.
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou, Shuli Cheng, Jiayi Chen
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
Long Tung Vuong, Hoang Phan, Vy Vo et al.
EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
Xinran Yang, Donghao Ji, Yuanqi Li et al.
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.
CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis
Youngkyoon Jang, Eduardo Pérez-Pellitero
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain et al.
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang, Shuai Liu, Hongming Guo et al.
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian, Jun Li, Jinpeng Wang et al.
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama et al.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
Max Gutbrod, David Rauber, Danilo Weber Nunes et al.
TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification
Dongyoon Yang, Jihu Lee, Yongdai Kim
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics
Tahira Kazimi, Ritika Allada, Pinar Yanardag
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun, Seokhyeon Hong, Chaelin Kim et al.
Learning with Noisy Triplet Correspondence for Composed Image Retrieval
Shuxian Li, Changhao He, XitingLiu et al.
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao, Daoyuan Chen, Yilun Huang et al.
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
Vaibhav Rathore, Shubhranil B, Saikat Dutta et al.
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon et al.
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
Xiaofu Chen, Yaxin Luo, Luo et al.
Multi-View Pose-Agnostic Change Localization with Zero Labels
Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim et al.
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
Dian Shao, Mingfei Shi, Shengda Xu et al.
HVI: A New Color Space for Low-light Image Enhancement
Qingsen Yan, Yixu Feng, Cheng Zhang et al.
LMO: Linear Mamba Operator for MRI Reconstruction
Wei Li, jiawei jiang, Jie Wu et al.
Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation
Yanda Chen, Gongwei Chen, Miao Zhang et al.
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng, Tongjia Chen, Shoubin Yu et al.
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning
Huu Binh Ta, Duc Nguyen, Quyen Tran et al.
T-FAKE: Synthesizing Thermal Images for Facial Landmarking
Philipp Flotho, Moritz Piening, Anna Kukleva et al.
A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains
Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada
Focal Split: Untethered Snapshot Depth from Differential Defocus
Junjie Luo, John Mamish, Alan Fu et al.
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation
Qi Zhang, Jibin Peng, Zhao Huang et al.
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
Byung Hyun Lee, Sungjin Lim, Se Young Chun
Continuous Space-Time Video Resampling with Invertible Motion Steganography
Yuantong zhang, Zhenzhong Chen
Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency
Hyunho Ha, Lei Xiao, Christian Richardt et al.
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Jiayi Guo, Zhao Junhao, Chaoqun Du et al.
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
Qiyuan Dai, Sibei Yang
OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining
Bingzhi Chen, Sisi Fu, Xiaocheng Fang et al.
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang et al.
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Yang Wu, Yun Zhu, Kaihua Zhang et al.
SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks
Shining Wang, Yunlong Wang, Ruiqi Wu et al.
V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy
Jiayin Zhao, Zhenqi Fu, Tao Yu et al.
A Unified Framework for Heterogeneous Semi-supervised Learning
Marzi Heidari, Abdullah Alchihabi, Hao Yan et al.
SLADE: Shielding against Dual Exploits in Large Vision-Language Models
Md Zarif Hossain, AHMED IMTEAJ
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis
Yinghao Wu, Shihui Guo, Yipeng Qin
Learning to Filter Outlier Edges in Global SfM
Nicole Damblon, Marc Pollefeys, Daniel Barath
Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling
Zhaoyu Zhang, Yang Hua, Guanxiong Sun et al.
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
Bikang Pan, Qun Li, Xiaoying Tang et al.
No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
Rong Qin, Xin Liu, Xingyu Liu et al.
Where's the Liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai, Yiyou Sun, Wei Cheng et al.
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang, Yukai Shi, Jiarong Ou et al.
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
Xianwei Zhuang, Zhihong Zhu, Yuxin Xie et al.
Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
Ying Jin, Jinlong Peng, Qingdong He et al.
CoMatcher: Multi-View Collaborative Feature Matching
Jintao Zhang, Zimin Xia, Mingyue Dong et al.
PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram
Sifan Zhou, Zhihang Yuan, Dawei Yang et al.
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather
Longyu Yang, Ping Hu, Shangbo Yuan et al.
Generalizable Object Keypoint Localization from Generative Priors
Dongkai Wang, Jiang Duan, Liangjian Wen et al.
Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss
Ravishankar Evani, Deepu Rajan, Shangbo Mao
Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model
Yingmao Miao, Zhanpeng Huang, Rui Han et al.
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
Emanuele Aiello, Umberto Michieli, Diego Valsesia et al.
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang, Chang Liu, Jin Wei et al.
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.
ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects
Woojin Lee, Hyugjae Chang, Jaeho Moon et al.
Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition
Anqi Zhu, Jingmin Zhu, James Bailey et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
All-directional Disparity Estimation for Real-world QPD Images
Hongtao Yu, Shaohui Song, Lihu Sun et al.
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Arnav Mohanty Das, Gantavya Bhatt, Lilly Kumari et al.
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
Mengqiu XU, Kaixin Chen, Heng Guo et al.
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Kunyu Wang, Xueyang Fu, Xin Lu et al.
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
Yuanhao Zou, Zhaozheng Yin
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs
Zicheng Zhang, Ziheng Jia, Haoning Wu et al.
MAD: Memory-Augmented Detection of 3D Objects
Ben Agro, Sergio Casas, Patrick Wang et al.
Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights
Ondrej Tybl, Lukas Neumann
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection
Fan Xing, Zhuo Tian, Xuefeng Fan et al.
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
ZHANG LINTONG, Kang Yin, Seong-Whan Lee
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Hao Cheng, Erjia Xiao, Jiayan Yang et al.
Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang, Jiahao Wang, Sucheng Ren et al.
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen et al.
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.
Autoregressive Sequential Pretraining for Visual Tracking
Shiyi Liang, Yifan Bai, Yihong Gong et al.
Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu, Xinting Hu, Yuyang Sun et al.
Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection
Jiangyi Wang, Na Zhao
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
chenkai zhang, Yiming Lei, Zeming Liu et al.
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction
Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Cheng Zhang, Haofei Xu, Qianyi Wu et al.
LEDiff: Latent Exposure Diffusion for HDR Generation
Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
Wonjoon Jin, Qi Dai, Chong Luo et al.
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu, Ligeng Zhu, Baifeng Shi et al.
Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval
Siyuan Duan, Yuan Sun, Dezhong Peng et al.
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.
Seeing More with Less: Human-like Representations in Vision Models
Andrey Gizdov, Shimon Ullman, Daniel Harari
Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality
Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam et al.
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
Ali Athar, Xueqing Deng, Liang-Chieh Chen
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images
Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li et al.
Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding
Zhaoran Zhao, Peng Lu, Anran Zhang et al.
Dense-SfM: Structure from Motion with Dense Consistent Matching
JongMin Lee, Sungjoo Yoo
Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation
Xiumei Xie, Zikai Huang, Wenhao Xu et al.
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
Yutao Tang, Yuxiang Guo, Deming Li et al.
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects
Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.
TransPixeler: Advancing Text-to-Video Generation with Transparency
Luozhou Wang, Yijun Li, ZhiFei Chen et al.
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering
Guofeng Feng, Siyan Chen, Rong Fu et al.
Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models
Daniel Samira, Edan Habler, Yuval Elovici et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
ERUPT: Efficient Rendering with Unposed Patch Transformer
Maxim Shugaev, Vincent Chen, Maxim Karrenbach et al.
Improved Monocular Depth Prediction Using Distance Transform Over Pre-semantic Contours with Self-supervised Neural Networks
Marwane Hariat, Antoine Manzanera, David Filliat
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
Chongjian GE, Chenfeng Xu, Yuanfeng Ji et al.
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Beilin Chu, Xuan Xu, Xin Wang et al.
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano, Min Woo Sun, James Burgess et al.
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li, Liang Wang, Chao Wang et al.
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Xin Wang, Kai Chen, Jiaming Zhang et al.
Towards Practical Real-Time Neural Video Compression
Zhaoyang Jia, Bin Li, Jiahao Li et al.
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.
Binarized Neural Network for Multi-spectral Image Fusion
Junming Hou, Xiaoyu Chen, Ran Ran et al.
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
Zichen Tang, Yuan Yao, Miaomiao Cui et al.
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
Huaxin Zhang, Xiaohao Xu, Xiang Wang et al.
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework
Ping Guo, Cheng Gong, Fei Liu et al.
Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion
Xiangfeng Xu, Pinyi Zhang, Wenxuan Huang et al.
Disentangled Pose and Appearance Guidance for Multi-Pose Generation
Tengfei Xiao, Yue Wu, Yuelong Li et al.
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
Jiazi Bu, Pengyang Ling, Pan Zhang et al.
Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning
Xiaohan Zou, Wenchao Ma, Shu Zhao
Convex Combination Star Shape Prior for Data-driven Image Semantic Segmentation
Xinyu Zhao, Jun Xie, Shengzhe Chen et al.
Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification
Haobin Zhong, Shuai He, Anlong Ming et al.
Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data
Wenxin Su, Song Tang, Xiaofeng Liu et al.
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
Qingyuan Wang, Rui Song, Jiaojiao Li et al.
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis
You Wang, Li Fang, Hao Zhu et al.
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney et al.
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining
Shangquan Sun, Wenqi Ren, Juxiang Zhou et al.
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection
Ming Sun, Rui Wang, Zixuan Zhu et al.
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
Keyizhi Xu, Chi Zhang, Zhan Chen et al.
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan, Dongming Wu, Wencheng Han et al.
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
jiajun cao, Yuan Zhang, Tao Huang et al.
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue, Yulin Wang, Haojun Jiang et al.
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing "Jed" Yang, Alexander Sax, Kevin Liang et al.
Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
Dong Li, Wenqi Zhong, Wei Yu et al.
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin, Dingkang Liang, Zhenghao Qi et al.
3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping
Chenhui Shi, Fulin Tang, Ning An et al.
STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation
Yisi Luo, Xile Zhao, Kai Ye et al.
Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets
Muhammad Abdullah Jamal, Omid Mohareri
Font-Agent: Enhancing Font Understanding with Large Language Models
Yingxin Lai, Cuijie Xu, Haitian Shi et al.
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
Greg Heinrich, Mike Ranzinger, Danny Yin et al.
Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning
Shouhang Zhu, Chenglin Li, Yuankun Jiang et al.
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei, Hang Wang, Bingbing Ni
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
Zesen Cheng, Hang Zhang, Kehan Li et al.
GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos
Soohyun Lee, SeoYeon Kim, HeeKyung Lee et al.
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
Saad Lahlali, Sandra Kara, Hejer AMMAR et al.
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.
GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes
Yunxuan Li, Lei Fan, Xiaoying Xing et al.
Be More Specific: Evaluating Object-centric Realism in Synthetic Images
Anqi Liang, Ciprian Adrian Corneanu, Qianli Feng et al.
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
Jungin Park, Jiyoung Lee, Kwanghoon Sohn
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization
Junhao Xu, Yanan Zhang, Zhi Cai et al.
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
Qiang Zhang, Mengsheng Zhao, Jiawei Liu et al.
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation
Ziyu Zhao, Xiaoguang Li, Lingjia Shi et al.
Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction
Ning Ni, Libao Zhang
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.
Segment Any Motion in Videos
Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.
TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model
Zhichao Zhai, Guikun Chen, Wenguan Wang et al.
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
Shuo Wang, Wanting Li, Yongcai Wang et al.
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
Kai Mao, Ping Wei, Yiyang Lian et al.
Augmenting Perceptual Super-Resolution via Image Quality Predictors
Fengjia Zhang, Samrudhdhi Rangrej, Tristan T Aumentado-Armstrong et al.
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network
Zhuochen Yu, Bijie Qiu, Andy W. H. Khong
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
Ziang Li, Hongguang Zhang, Juan Wang et al.
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging
Ping Wang, Lishun Wang, Gang Qu et al.
Compositional Targeted Multi-Label Universal Perturbations
Hassan Mahmood, Ehsan Elhamifar
CGMatch: A Different Perspective of Semi-supervised Learning
Bo Cheng, Jueqing Lu, Yuan Tian et al.
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection
Jinhyung Park, Navyata Sanghvi, Hiroki Adachi et al.
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Jianzong Wu, Chao Tang, Jingbo Wang et al.
CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
Xuli Shen, Hua Cai, Weilin Shen et al.
Dynamic Motion Blending for Versatile Motion Editing
Nan Jiang, Hongjie Li, Ziye Yuan et al.
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions
Qiang Li, Jian Ruan, Fanghao Wu et al.