Most Cited 2025 "remote sensing video" Papers
22,274 papers found • Page 46 of 112
Conference
Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies
Hyunchai Jeong, Adiba Ejaz, Jin Tian et al.
Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
Chong Cheng, Gaochao Song, Yiyang Yao et al.
BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
Cunhang Fan, Enrui Liu, Andong Li et al.
Visually Consistent Hierarchical Image Classification
Seulki Park, Youren Zhang, Stella Yu et al.
Factor Graph-based Interpretable Neural Networks
Yicong Li, Kuanjiu Zhou, Shuo Yu et al.
Spectral Convolutional Conditional Neural Process
Peiman Mohseni, Nick Duffield
Low-Rank Adapting Models for Sparse Autoencoders
Matthew Chen, Josh Engels, Max Tegmark
$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
Senmao Li, Kai Wang, Joost van de Weijer et al.
CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models
Junbo Yin, Chao Zha, Wenjia He et al.
Reinforcement Learning for Quantum Control under Physical Constraints
Jan Ole Ernst, Aniket Chatterjee, Tim Franzmeyer et al.
End-to-End Vision Tokenizer Tuning
Wenxuan Wang, Fan Zhang, Yufeng Cui et al.
Towards Generalizable Multi-Camera 3D Object Detection via Perspective Rendering
Hao Lu, Yunpeng Zhang, Guoqing Wang et al.
Semi-Supervised Online Cross-Modal Hashing
Xiao Kang, Xingbo Liu, Xuening Zhang et al.
Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
Lin-Zhuo Chen, Kangjie Liu, Youtian Lin et al.
Enhancing SQL Query Generation with Neurosymbolic Reasoning
Henrijs Princis, Cristina David, Alan Mycroft
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
Wei Hung, Shao-Hua Sun, Ping-Chun Hsieh
Training with “Paraphrasing the Original Text” Teaches LLM to Better Retrieve in Long-Context Tasks
Yijiong Yu, Yongfeng Huang, Zhixiao Qi et al.
When, Where and Why to Average Weights?
Niccolò Ajroldi, Antonio Orvieto, Jonas Geiping
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
Ruifeng Li, Mingqian Li, Wei Liu et al.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
Yawei Li, David Rügamer, Bernd Bischl et al.
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang, Adam Dziedzic, Grace Kim et al.
Partially Observable Reinforcement Learning with Memory Traces
Onno Eberhard, Michael Muehlebach, Claire Vernade
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
Yuxuan Zhou, Xien Liu, Chen Ning et al.
CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search
Xiao-Wen Yang, Zhi Zhou, Haiming Wang et al.
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
Vindula Jayawardana, Baptiste Freydt, Ao Qu et al.
Zero-Shot Offline Imitation Learning via Optimal Transport
Thomas Rupf, Marco Bagatella, Nico Gürtler et al.
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
Jihwan Yoon, Sangbeom Han, Jaeseok Oh et al.
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
Insoo Kim, Hana Lee, Hyong-Euk Lee et al.
Geometric Hyena Networks for Large-scale Equivariant Learning
Artem Moskalev, Mangal Prakash, Junjie Xu et al.
Is Limited Participant Diversity Impeding EEG-based Machine Learning?
Philipp Bomatter, Henry Gouk
Clustering Properties of Self-Supervised Learning
Xi Weng, Jianing An, Xudong Ma et al.
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo, Fan Ma, Kaixin Shen et al.
Noisy Test-Time Adaptation in Vision-Language Models
Chentao Cao, Zhun Zhong, (Andrew) Zhanke Zhou et al.
Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-Index Models
Ilias Diakonikolas, Giannis Iakovidis, Daniel Kane et al.
Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
Abhijeet Mulgund, Chirag Pabbaraju
Hypo3D: Exploring Hypothetical Reasoning in 3D
Ye Mao, Weixun Luo, Junpeng Jing et al.
KAES: Multi-aspect Shared Knowledge Finding and Aligning for Cross-prompt Automated Scoring of Essay Traits
Xia Li, Wenjing Pan
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
Wonyong Seo, Jihyong Oh, Munchurl Kim
Believing is Seeing: Unobserved Object Detection using Generative Models
Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin, Valery Weber, Ahmed Nassar et al.
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen, Bingchen Zhao, Yilun Chen et al.
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images
Jiuchen Chen, Xinyu Yan, Qizhi Xu et al.
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang et al.
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Xi Wang, Hongzhen Li, Heng Fang et al.
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition
Chuanfu Shen, Rui Wang, Lixin Duan et al.
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.
Muchen Li, Sammy Christen, Chengde Wan et al.
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.
GENIUS: A Generative Framework for Universal Multimodal Search
Sungyeon Kim, Xinliang Zhu, Xiaofan Lin et al.
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim, Hea In Jeong, JungHoon Sung et al.
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang et al.
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li, Zixuan Huang, Anh Thai et al.
Learning Affine Correspondences by Integrating Geometric Constraints
Pengju Sun, Banglei Guan, Zhenbao Yu et al.
BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering
Minye Wu, Haizhao Dai, Kaixin Yao et al.
Context-Aware Multimodal Pretraining
Karsten Roth, Zeynep Akata, Dima Damen et al.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
Universal Scene Graph Generation
Shengqiong Wu, Hao Fei, Tat-seng Chua
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers
Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner et al.
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi, Jing Xu, Lijing Lu et al.
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang, Hao Lu, Qingyong Hu et al.
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Wei Li, Pin-Yu Chen, Sijia Liu et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Shengqiong Wu, Hao Fei, Jingkang Yang et al.
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Reasoning to Attend: Try to Understand How <SEG> Token Works
Rui Qian, Xin Yin, Dejing Dou
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Yu Cao, Zengqun Zhao, Ioannis Patras et al.
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures
Guoxing Sun, Rishabh Dabral, Heming Zhu et al.
MixerMDM: Learnable Composition of Human Motion Diffusion Models
Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.
Localizing Events in Videos with Multimodal Queries
Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma et al.
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai, Qingjie Liu, Yunhong Wang
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao, Menglu Wang, King Man Tam et al.
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang, Xiaoming Liu, Yu Kong
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
Chi Su, Xiaoxuan Ma, Jiajun Su et al.
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
Jeonghyeon Kim, Sangheum Hwang
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
Yitang Li, Mingxian Lin, Zhuo Lin et al.
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang, Lin Zhu, Zewen Sun et al.
GASP: Gaussian Avatars with Synthetic Priors
Jack Saunders, Charlie Hewitt, Yanan Jian et al.
Faster Parameter-Efficient Tuning with Token Redundancy Reduction
Kwonyoung Kim, Jungin Park, Jin Kim et al.
Test-Time Visual In-Context Tuning
Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr et al.
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
Fangyu Wu, Yuhao Chen
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.
MultiMorph: On-demand Atlas Construction
Mazdak Abulnaga, Andrew Hoopes, Neel Dey et al.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo et al.
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
Long Tung Vuong, Hoang Phan, Vy Vo et al.
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
Senmao Li, Lei Wang, Kai Wang et al.
Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation
Jiao Xu, Xin Chen, Lihe Zhang
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.
Geometry in Style: 3D Stylization via Surface Normal Deformation
Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong, Bokai Lai, Yiyi Zhou et al.
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen, Yikai Wang, Wenqiang Sun et al.
Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling
Xingyu Chen, Zihao Feng, Kun Qian et al.
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
Yancong Lin, Shiming Wang, Liangliang Nan et al.
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang, Xiaolu Liu, Lingdong Kong et al.
Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers
Ji Zhao, Banglei Guan, Zibin Liu et al.
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
Haiyang Mei, Pengyu Zhang, Mike Zheng Shou
ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images
Yanqing Shen, Turcan Tuna, Marco Hutter et al.
T-FAKE: Synthesizing Thermal Images for Facial Landmarking
Philipp Flotho, Moritz Piening, Anna Kukleva et al.
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views
Ningli Xu, Rongjun Qin
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation
Yuhan Liu, Yixiong Zou, Yuhua Li et al.
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin, Stella X. Yu
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang, Zekun Li, Jian Zhang et al.
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization
Xiran Wang, Jian Zhang, Lei Qi et al.
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
Hongkai Wei, YANG YANG, Shijie Sun et al.
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank, Jim Davis
Order-One Rolling Shutter Cameras
Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano et al.
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
Emanuele Aiello, Umberto Michieli, Diego Valsesia et al.
Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation
David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang et al.
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang, Chao Xu, Cheng Yu et al.
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
Shian Du, Menghan Xia, Chang Liu et al.
Optimizing for the Shortest Path in Denoising Diffusion Model
Ping Chen, Xingpeng Zhang, Zhaoxiang Liu et al.
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin, Zhu Xu, Yang Liu
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi, Matteo Farina, Massimiliano Mancini et al.
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
Hongye Cheng, Tianyu Wang, guangsi shi et al.
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.
DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing
Yufei Huang, Bangyan Liao, Yuqi Hu et al.
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng, Hang Zhou, Jing Liao et al.
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
Ziqin Huang, Gu Wang, Chenyangguang Zhang et al.
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik et al.
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Zekun Li et al.
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane, Gabrielle Ebbrecht, Ying Wang et al.
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
Luyuan Xie, Tianyu Luan, Wenyuan Cai et al.
Towards All-in-One Medical Image Re-Identification
Yuan Tian, Kaiyuan Ji, Rongzhao Zhang et al.
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
Yu Mao, Jun Wang, Nan Guan et al.
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos
Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
zhuangzhuang chen, hualiang wang, Chubin Ou et al.
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
Yutao Tang, Yuxiang Guo, Deming Li et al.
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari et al.
Learnable Infinite Taylor Gaussian for Dynamic View Rendering
Bingbing Hu, Yanyan Li, rui xie et al.
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range
Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
Yanlong Xu, Haoxuan Qu, Jun Liu et al.
Foundations of the Theory of Performance-Based Ranking
Sébastien Piérard, Anaïs Halin, Anthony Cioppa et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras
Jiaxi Deng, Yushen Wang, Haitao Meng et al.
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Yutong Wang, Jiajie Teng, Jiajiong Cao et al.
Neural Hierarchical Decomposition for Single Image Plant Modeling
Zhihao Liu, Zhanglin Cheng, Naoto Yokoya
Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On
Nannan Zhang, Yijiang Li, Dong Du et al.
Unity in Diversity: Video Editing via Gradient-Latent Purification
Junyu Gao, Kunlin Yang, Xuan Yao et al.
Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
Kaouther Messaoud, Matthieu Cord, Alex Alahi
A Regularization-Guided Equivariant Approach for Image Restoration
Yulu Bai, Jiahong Fu, Qi Xie et al.
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation
Ziyu Zhao, Xiaoguang Li, Lingjia Shi et al.
Segment Any-Quality Images with Generative Latent Space Enhancement
Guangqian Guo, Yong Guo, Xuehui Yu et al.
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization
Sihao Liu, Yibo Yang, Xiaojie Li et al.
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering
Hanxiao Sun, Yupeng Gao, Jin Xie et al.
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?
Zebin You, Xinyu Zhang, Hanzhong Guo et al.
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
Weikang Bian, Zhaoyang Huang, Xiaoyu Shi et al.
Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels
Qiming Xia, Wenkai Lin, Haoen Xiang et al.
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal, Xiang Yue, Erion Plaku et al.
Dynamic Motion Blending for Versatile Motion Editing
Nan Jiang, Hongjie Li, Ziye Yuan et al.
Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions
Boran Wen, Dingbang Huang, Zichen Zhang et al.
LightLoc: Learning Outdoor LiDAR Localization at Light Speed
Wen Li, Chen Liu, Shangshu Yu et al.
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra et al.
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang, Daqing Liu, Xinchen Liu et al.
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen, Zizheng Huang, Yan Hong et al.
3D Dental Model Segmentation with Geometrical Boundary Preserving
Shufan Xi, Zexian Liu, Junlin Chang et al.
CryptoFace: End-to-End Encrypted Face Recognition
Wei Ao, Vishnu Naresh Boddeti
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu, Jianfeng Wang, Zhengyuan Yang et al.
ViiNeuS: Volumetric Initialization for Implicit Neural Surface Reconstruction of Urban Scenes with Limited Image Overlap
Hala Djeghim, Nathan Piasco, Moussab Bennehar et al.
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
Siyi Du, Xinzhe Luo, Declan ORegan et al.
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
Xinyi Zhang, Naiqi Li, Angela Dai
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou, Kai Zhang, Sai Bi et al.
Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach
Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.
Simplification Is All You Need against Out-of-Distribution Overconfidence
Keke Tang, Chao Hou, Weilong Peng et al.
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation
Amin Karimi, Charalambos Poullis
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang, Leonor Fermoselle, Duy Ta et al.
Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations
Haitong Liu, Kuofeng Gao, Yang Bai et al.
Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications
Tong Bu, Maohua Li, Zhaofei Yu
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse, Jan van Gemert
Understanding Multi-Task Activities from Single-Task Videos
Yuhan Shen, Ehsan Elhamifar
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid et al.
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
Ziyue Zhu, Shenlong Wang, Jin Xie et al.
DistinctAD: Distinctive Audio Description Generation in Contexts
Bo Fang, Wenhao Wu, Qiangqiang Wu et al.
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau, Guillaume Bourmaud
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li, Bin Chen, Chen Zhao et al.