Most Cited 2025 "hardware robotic control" Papers
22,274 papers found • Page 81 of 112
Conference
Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang, Wenzhao Zheng, Dalong Du et al.
Generative Model Inversion Through the Lens of the Manifold Hypothesis
Xiong Peng, Bo Han, Fengfei Yu et al.
Lidar Waveforms are Worth 40x128x33 Words
Dominik Scheuble, Hanno Holzhüter, Steven Peters et al.
Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation
Pierre-André Brousseau, Sébastien Roy
InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning
Haotian Chi, Zeyu Feng, Yueming LYU et al.
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.
High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach
Yuchong Chen, Jian Yu, Shaoyan Gai et al.
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul Paik, Debashis Sen
Polarimetric Neural Field via Unified Complex-Valued Wave Representation
Chu Zhou, Yixin Yang, Junda Liao et al.
Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering
Siddharth Tourani, Jayaram Reddy, Akash Kumbar et al.
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang, Xu Zhao
SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Maximilian Pittner, Joel Janai, Mario Faigle et al.
Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
Mengkun She, Felix Seegräber, David Nakath et al.
Super Resolved Imaging with Adaptive Optics
Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.
HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network
Juhyung Ha, Vibhas Vats, Alimoor Reza et al.
Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment
Qingqian Yang, Peishen Yan, Xiaoyu Wu et al.
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei, Jinxi Li, Yafei YANG et al.
Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling
Wenting Luan, Siqi Lu, Yongbin Zheng et al.
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia et al.
Knowledge Distillation for Learned Image Compression
Yunuo Chen, Zezheng Lyu, Bing He et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Xiangchen Song, Jiaqi Sun, Zijian Li et al.
NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals
Jiro Abe, Gaku Nakano, Kazumine Ogura
InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
Zhuoran Yang, Xi Guo, Chenjing Ding et al.
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu, Peijin Wang, Hanbo Bi et al.
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao et al.
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
Tomer Garber, Tom Tirer
GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views
Hang Yang, Le Hui, Jianjun Qian et al.
Teeth Reconstruction and Performance Capture Using a Phone Camera
Weixi Zheng, Jingwang Ling, Zhibo Wang et al.
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai, Menghan Xia, Xiao Fu et al.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang, Minjae Seong, Jisong Kim et al.
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao, Lidong Yu, Zhidan Liu et al.
ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions
Yue Huang, Zhengzhe Jiang, Xiaonan Luo et al.
DAA*: Deep Angular A Star for Image-based Path Planning
Zhiwei Xu
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang, June Suk Choi, Jaehyeong Jo et al.
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
JIXUAN FAN, Wanhua Li, Yifei Han et al.
ROAR: Reducing Inversion Error in Generative Image Watermarking
Hanyi Wang, Han Fang, Shi-Lin Wang et al.
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du, Hui Li, Han Xu et al.
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella, Davide Talon, Ziyue Liu et al.
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas et al.
Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction
Tuo Feng, Wenguan Wang, Yi Yang
Spatially-Varying Autofocus
Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole
Event-based Visual Vibrometry
Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti et al.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.
Pathways on the Image Manifold: Image Editing via Video Generation
Noam Rotstein, Gal Yona, Daniel Silver et al.
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim, Jae Young Lee, Gyojin Han et al.
Correlation Dimension of Autoregressive Large Language Models
Xin Du, Kumiko Tanaka-Ishii
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu, Wei Hu
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
Yuran Wang, Yingping Liang, Yutao Hu et al.
FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction
Donghyun Lee, Dawoon Jeong, Jae W. Lee et al.
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu, Runze Wang, Bin Ren et al.
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.
Scene-agnostic Pose Regression for Visual Localization
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye, Yushuang Wu, Ziteng Lu et al.
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao SHEN et al.
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo, Yanyan Li, Gim Hee Lee
SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations
Qi Zhang, Chi Huang, Qian Zhang et al.
Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning
Melanie Götz, Torsten Krauß, Alexandra Dmitrienko
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad, Maha Shadaydeh, Joachim Denzler
PointGAC: Geometric-Aware Codebook for Masked Point Modeling
Abiao Li, Chenlei Lv, Guofeng Mei et al.
Angular Constraint Embedding via SpherePair Loss for Constrained Clustering
Shaojie Zhang, Ke Chen
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.
Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo et al.
GLVD: Guided Learned Vertex Descent
Pol Caselles RIco, Francesc Moreno-Noguer
Customizing Domain Adapters for Domain Generalization
Yuyang Ji, Zeyi Huang, Haohan Wang et al.
RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters
Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging
Ying Xue, Jiaxi Jiang, Rayan Armani et al.
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu et al.
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
Jingfeng Wu, Pierre Marion, Peter Bartlett
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi, Shaoshuai Shi, Kehua Sheng et al.
MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model
Yaoye Zhu, Zhe Wang, Yan Wang
Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning
Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King et al.
PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function
Qikui Zhu
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu, Ahyun Seo, Minsu Cho
PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon et al.
AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving
Jiawei Xu, Kai Deng, Zexin Fan et al.
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Zixuan Hu, Dongxiao Li, Xinzhu Ma et al.
HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang, hao zou, Bochen Wang et al.
PhysAnimator: Physics-Guided Generative Cartoon Animation
Tianyi Xie, Yiwei Zhao, Ying Jiang et al.
Inverse Image-Based Rendering for Light Field Generation from Single Images
Hyunjun Jung, Hae-Gon Jeon
FlowStyler: Artistic Video Stylization via Transformation Fields Transports
YuNing Gong, Jiaming Chen, Xiaohua Ren et al.
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
Jin Hu, Mingjia Li, Xiaojie Guo
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
WU Sitong, Haoru Tan, Yukang Chen et al.
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
Hoang Phan, Tung Lam Tran, Quyen Tran et al.
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
Zonglin Di, Jing Shi, Yifei Fan et al.
FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling
Zhenghao Gao, Shengjie Xu, Zijing Li et al.
Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm
Hongqiu Wang, Wu Chen, Xiangde Luo et al.
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu, Dong Li, Yuxin Ma et al.
Federated Continuous Category Discovery and Learning
Lixu Wang, Chenxi Liu, Junfeng Guo et al.
DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection
Aashish Sharma
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.
BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration
Hanyuan Liu, Chengze Li, Minshan Xie et al.
Rethinking Key-frame-based Micro-expression Recognition: A Robust and Accurate Framework Against Key-frame Errors
Zheyuan Zhang, Weihao Tang, Hong Chen
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
Hongwei Lin, Dongyu Pan, Qiming Xia et al.
What we need is explicit controllability: Training 3D gaze estimator using only facial images
Tingwei Li, Jun Bao, Zhenzhong Kuang et al.
SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance
Wenjin Zhang, Xinyu Li, Chenyang Gao et al.
Unbiased Missing-modality Multimodal Learning
Ruiting Dai, Chenxi Li, Yandong Yan et al.
Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection
Xuehan Chen, Guangyu Ren, Tianhong Dai et al.
Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang, Bowen Zhao, Zhengming Ding et al.
Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking
Hyunseop Kim, Juheon Jeong, Hanul Kim et al.
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
Jing Wang, Rui Zhao, Ruiqin Xiong et al.
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen, Yizhou Wang, SHIXIANG TANG et al.
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.
MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou, Tong Wang, Yunkai Ma et al.
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie, Hao Tang, Bin Duan et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Learning an Implicit Physics Model for Image-based Fluid Simulation
Emily Jia, Jiageng Mao, Zhiyuan Gao et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao, Xiangbo Shu, Xin Jiang et al.
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.
SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses
Federico D'Agostino, Lisa Schwetlick, Matthias Bethge et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
MambaML: Exploring State Space Models for Multi-Label Image Classification
Xuelin Zhu, Jian liu, Jiuxin Cao et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
Towards Robustness of Person Search against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An et al.
MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation
Kaleab Kinfu, Rene Vidal
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu, Ying Wen, Longzhen Yang et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.
SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.
ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios
Jun Yin, Pengyu Zeng, Licheng Shen et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement
Mostofa Rafid Uddin, Jana Armouti, Min Xu
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang et al.
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI, Fayao Liu, Jingyi Liao et al.
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng et al.
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation
Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Hancheng Min, Zhihui Zhu, Rene Vidal
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang et al.
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.
Similarity Memory Prior is All You Need for Medical Image Segmentation
Hao Tang, Zhiqing Guo, Liejun Wang et al.
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen, Wen Shen, Zhihua Wei et al.
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan et al.
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection
Xingjian Wang, Li Chai, Jiming Chen
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Shurong Zheng, Yousong Zhu et al.
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares
Lachlan MacDonald, Hancheng Min, Leandro Palma et al.
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma, Junzhe Jiang, Yurui Chen et al.
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang, Liangzu Peng, Jinqi Luo et al.
CLIPSym: Delving into Symmetry Detection with CLIP
Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu et al.
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
Enming Zhang, Yuzhe Li, Yuliang Liu et al.
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng, Xin Jiang, Zechao Li
Attention on the Sphere
Boris Bonev, Max Rietmann, Andrea Paris et al.
Federated Domain Generalization with Domain-specific Soft Prompts Generation
Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.
Removing Out-of-Focus Reflective Flares via Color Alignment
Fengbo Lan, Chang Wen Chen
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen, Lei Zhang, Yakun Niu
DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang