Most Cited AAAI "training-free extrapolation" Papers
5,317 papers found • Page 16 of 27
Conference
A Gaussian Filter-Based 3D Registration Method for Series Section Electron Microscopy
Zhenbang Zhang, Hongjia Li, Zhiqiang Xu et al.
Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model
Guanhao Zhao, Zhenya Huang, Cheng Cheng et al.
DeNC: Unleash Neural Codecs in Video Streaming with Diffusion Enhancement
Qihua Zhou, Ruibin Li, Jingcai Guo et al.
NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors
Ziqi Zhou, Bowen Li, Yufei Song et al.
Text-Guided Fine-grained Counterfactual Inference for Short Video Fake News Detection
Linlin Zong, Wenmin Lin, Jiahui Zhou et al.
Dynamic Interactive Bimodal Hypergraph Networks for Emotion Recognition in Conversations
Xuping Chen, Wuzhen Shi
Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency
Yuhong Chen, Ailin Song, Huifeng Yin et al.
SocialSim: Towards Socialized Simulation of Emotional Support Conversation
Zhuang Chen, Yaru Cao, Guanqun Bi et al.
Symbolic Functional Decomposition: A Reconfiguration Approach
Mateus de Oliveira Oliveira, Wim Van Den Broeck
SS-GEN: A Social Story Generation Framework with Large Language Models
Yi Feng, Mingyang Song, Jiaqi Wang et al.
MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models
Xilin He, Haijian Liang, Boyi Peng et al.
CraftFactory: A Conditioned Control Policy Benchmark for Compositional Generalization
Jinbing Hou, Youpeng Zhao, Jian Zhao
AFFAKT: A Hierarchical Optimal Transport Based Method for Affective Facial Knowledge Transfer in Video Deception Detection
Zihan Ji, Xuetao Tian, Ye Liu
Deep Reinforcement Learning with Time-Scale Invariant Memory
Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj
DARR: A Dual-Branch Arithmetic Regression Reasoning Framework for Solving Machine Number Reasoning
Chengtai Li, Yee Yang Tan, Yuting He et al.
MI-CAPTCHA: Enhance the Security of CAPTCHA Using Mooney Images
Jingmeng Li, Lukang Fu, Surun Yang et al.
Asymmetric Cross-Modal Hashing Based on Formal Concept Analysis
Yinan Li, Jun Long, Zhan Yang
Towards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism
Yu Liang, Wenjie Wei, Ammar Belatreche et al.
Towards More Discriminative Feature Learning in SNNs with Temporal-Self-Erasing Supervision
Wei Liu, Li Yang, Mingxuan Zhao et al.
Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning
Yan-Kai Liu, Jinyu Cai, Bao-Liang Lu et al.
SpikingYOLOX: Improved YOLOX Object Detection with Fast Fourier Convolution and Spiking Neural Networks
Wei Miao, Jiangrong Shen, Qi Xu et al.
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi, Suyu Ye, Xinyu Fang et al.
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind
Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida et al.
Knowledge-Enhanced Hierarchical Heterogeneous Graph for Personality Identification with Limited Training Data
Yuxuan Song, Qiudan Li, Yilin Wu et al.
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
Bin Tang, Ke-Qi Pan, Miao Zheng et al.
A Multi-Focus-Driven Multi-Branch Network for Robust Multimodal Sentiment Analysis
Chuanqi Tao, Jiaming Li, Tianzi Zang et al.
Alignment of CNN and Human Judgments of Geometric and Topological Concepts
Neha Upadhyay, Vijay Marupudi, Kamala Varma et al.
DDJND: Dual Domain Just Noticeable Difference in Multi-Source Content Images with Structural Discrepancy
Miaohui Wang, Zhenming Li, Wuyuan Xie
BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations
Yusong Wang, Xuanye Fang, Huifeng Yin et al.
Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier
Zachary Wojtowicz, Simon DeDeo
DepMGNN: Matrixial Graph Neural Network for Video-based Automatic Depression Assessment
Zijian Wu, Leijing Zhou, Shuanglin Li et al.
Leveraging Asynchronous Spiking Neural Networks for Ultra Efficient Event-Based Visual Processing
DingYi Zeng, Yuchen Wang, Honglin Cao et al.
Learning Concept Prerequisite Relation via Global Knowledge Relation Optimization
Miao Zhang, Jiawei Wang, Kui Xiao et al.
SalM²: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
Chunyu Zhao, Wentao Mu, Xian Zhou et al.
Look Around Before Locating: Considering Content and Structure Information for Visual Grounding
Shiyi Zheng, Peizhi Zhao, Zhilong Zheng et al.
PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation
Hengde Zhu, Xiangyu Kong, Weicheng Xie et al.
Bridge Then Begin Anew: Generating Target-Relevant Intermediate Model for Source-Free Visual Emotion Adaptation
Jiankun Zhu, Sicheng Zhao, Jing Jiang et al.
Aspect Enhancement and Text Simplification in Multimodal Aspect-Based Sentiment Analysis for Multi-Aspect and Multi-Sentiment Scenarios
Linlin Zhu, Heli Sun, Qunshu Gao et al.
Progressive Self-Learning for Domain Adaptation on Symbolic Regression of Integer Sequences
Yaohui Zhu, Kaiming Sun, Zhengdong Luo et al.
HSRDiff: A Hierarchical Self-Regulation Diffusion Model for Stochastic Semantic Segmentation
Han Yang, Chuanguang Yang, Zhulin An et al.
AQUAFace: Age-Invariant Quality Adaptive Face Recognition for Unconstrained Selfie vs ID Verification
Shivang Agarwal, Jyoti Chaudhary, Sadiq Siraj Ebrahim et al.
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
Jingkun An, Yinghao Zhu, Zongjian Li et al.
CA-MLIF: Cross-Attention and Multimodal Low-Rank Interaction Fusion Framework for Tumor Prognostic Prediction
Yajun An, Jiale Chen, Huan Lin et al.
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.
Can Generative Models Improve Self-Supervised Representation Learning?
Sana Ayromlou, Vahid Reza Khazaie, Fereshteh Forghani et al.
The Master Key Filters Hypothesis: Deep Filters Are General
Zahra Babaiee, Peyman M. Kiasari, Daniela Rus et al.
Frozen Language Models Are Gradient Coherence Rectifiers in Vision Transformers
Lichen Bai, Zixuan Xiong, Hai Lin et al.
Plug-and-Play Tri-Branch Invertible Block for Image Rescaling
Jingwei Bao, Jinhua Hao, Pengcheng Xu et al.
Dual Manifold Regularization Steered Robust Representation Learning for Point Cloud Analysis
Jian Bi, Qianliang Wu, Jianjun Qian et al.
Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination
Qi Bi, Jingjun Yi, Haolan Zhan et al.
CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
Xiuli Bi, Jian Lu, Bo Liu et al.
FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
Lingling Cai, Kang Zhao, Hangjie Yuan et al.
Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval
Rui Cai, Zhiyu Dong, Jianfeng Dong et al.
Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue
Shuo Cai, Xinzhe Han, Shuhui Wang
Object-level Geometric Structure Preserving for Natural Image Stitching
Wenxiao Cai, Wankou Yang
Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model
Cong Cao, Huanjing Yue, Xin Liu et al.
ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects
Qihang Cao, Huangxun Chen
Deep Graph Online Hashing for Multi-Label Image Retrieval
Yuan Cao, Xiangru Chen, Zifan Liu et al.
Segment Any 3D Gaussians
Jiazhong Cen, Jiemin Fang, Chen Yang et al.
Text2Relight: Creative Portrait Relighting with Text Guidance
Junuk Cha, Mengwei Ren, Krishna Kumar Singh et al.
KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences
Keng-Wei Chang, Zi-Ming Wang, Shang-Hong Lai
RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Qikai Chang, Mingjun Chen, Changpeng Pi et al.
Sharpening Neural Implicit Functions with Frequency Consolidation Priors
Chao Chen, Yu-Shen Liu, Zhizhong Han
MaskPrompt: Open-Vocabulary Affordance Segmentation with Object Shape Mask Prompts
Dongpan Chen, Dehui Kong, Jinghua Li et al.
Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion
Haipeng Chen, Yuheng Yang, Yingda Lyu
Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation
Haipeng Chen, Sifan Wu, Zhigang Wang et al.
Adversarial Learning Under Hybrid Perturbations for Robust Acute Lymphoblastic Leukemia Classification
Jie Chen, Xinyuan Liu, Xintong Liu et al.
Dual-Level Precision Edges Guided Multi-View Stereo with Accurate Planarization
Kehua Chen, Zhenlong Yuan, Tianlu Mao et al.
Contrasting Adversarial Perturbations: The Space of Harmless Perturbations
Lu Chen, Shaofeng Li, Benhao Huang et al.
CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization
Nan Chen, Mengqi Huang, Zhuowei Chen et al.
Infinite-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation
Qihua Chen, Yue Ma, Hongfa Wang et al.
Unsupervised Degradation Representation Aware Transform for Real-World Blind Image Super-Resolution
Sen Chen, Hongying Liu, Chaowei Fang et al.
Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection
Shunxin Chen, Ajian Liu, Junze Zheng et al.
Cross-View Referring Multi-Object Tracking
Sijia Chen, En Yu, Wenbing Tao
DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models
Wei Chen, Jianwei Niu, Xuefeng Liu et al.
Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning
Xingchi Chen, Zhuoran Zheng, Xuerui Li et al.
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
Xuesong Chen, Shaoshuai Shi, Tao Ma et al.
Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis
Yiliang Chen, Steven SC Ho, Cheng Xu et al.
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
Yitong Chen, Wenhao Yao, Lingchen Meng et al.
3D Measurement of Complex Textured Objects Based on Bidirectional Fringe Projection
Yuchong Chen, Jian Yu, Shaoyan Gai et al.
Unsupervised Diffusion-Based Degradation Modeling for Real-World Super-Resolution
Yuying Chen, Mingde Yao, Wenbo Li et al.
EvHDR-GS: Event-guided HDR Video Reconstruction with 3D Gaussian Splatting
Zehao Chen, Zhan Lu, De Ma et al.
VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping
Zheng Chen, Yu Zeng, Zehui Chen et al.
VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
Zhipeng Chen, Lan Yang, Yonggang Qi et al.
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen et al.
Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation
Ziyang Chen, Yiwen Ye, Yongsheng Pan et al.
3DPGS: 3D Probabilistic Graph Search for Archaeological Piece Grouping
Junfeng Cheng, Yingkai Yang, Tania Stathaki
Effective Diffusion Transformer Architecture for Image Super-Resolution
Kun Cheng, Lei Yu, Zhijun Tu et al.
Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation
Zesen Cheng, Kehan Li, Li Hao et al.
Bridge 2D-3D: Uncertainty-aware Hierarchical Registration Network with Domain Alignment
Zhixin Cheng, Jiacheng Deng, Xinjun Li et al.
Zero-Shot Scene Change Detection
Kyusik Cho, Dong Yeop Kim, Euntai Kim
Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting
Dasol Choi, Dongbin Na
SIDL: A Real-World Dataset for Restoring Smartphone Images with Dirty Lenses
Sooyoung Choi, Sungyong Park, Heewon Kim
Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces
Wonhyeok Choi, Kyumin Hwang, Minwoo Choi et al.
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung, Seungwon Lim, Sangkyu Lee et al.
AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples
Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor et al.
GCD-Sampling: A General Cross-scale Decoupled Sampling for Point Cloud
Tao Dai, Yanzi Wang, Jianyu Xiong et al.
Harmonious Music-driven Group Choreography with Trajectory-Controllable Diffusion
Yuqin Dai, Wanlu Zhu, Ronghui Li et al.
Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Image Generation
Quan Dao, Hao Phung, Trung Tuan Dao et al.
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery
Shristi Das Biswas, Matthew Shreve, Xuelu Li et al.
Single Exposure Quantitative Phase Imaging with a Conventional Microscope Using Diffusion Models
Gabriel della Maggiora, Luis Alberto Croquevielle, Harry Horsley et al.
Deep Non-Rigid Structure-from-Motion Revisited: Canonicalization and Sequence Modeling
Hui Deng, Jiawei Shi, Zhen Qin et al.
DiffCorr: Conditional Diffusion Model with Reliable Pseudo-Label Guidance for Unsupervised Point Cloud Shape Correspondence
Jiacheng Deng, Jiahao Lu, Zhixin Cheng et al.
Adaptive Siamese Masked Autoencoder with Global Optimization for Unsupervised Point Cloud Shape Correspondence
Jiacheng Deng, Jiahao Lu
OTIAS: OcTree Implicit Adaptive Sampling for Multispectral and Hyperspectral Image Fusion
Shangqi Deng, Jun Ma, Liang-Jian Deng et al.
Boundary-Aware Temporal Dynamic Pseudo-Supervision Pairs Generation for Zero-Shot Natural Language Video Localization
Xiongwen Deng, Haoyu Tang, Han Jiang et al.
Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation
Yuhui Deng, Yuqin Lu, Yangyang Xu et al.
Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models
Guanqi Ding, Chengyu Yang, Shuhui Wang et al.
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding, Shaobin Zhuang, Kunchang Li et al.
AS-Det: Active Sampling for Adaptive 3D Object Detection in Point Clouds
Ziheng Ding, Xiaze Zhang, Qi Jing et al.
GarFast: Realistic and Fast Garment Transfer with a Simplified Parser-Free Approach
Chenghu Du, Junyin Wang, Yi Rong et al.
Latent Diffusion-Enhanced Virtual Try-On via Optimized Pseudo-Label Generation
Chenghu Du, Junyin Wang, Feng Yu et al.
HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions
Keyu Du, Hao Xu, Haipeng Li et al.
A Diffusion-Based Framework for Occluded Object Movement
Zheng-Peng Duan, Jiawei Zhang, Siyu Liu et al.
IniRetinex: Rethinking Retinex-type Low-Light Image Enhancer via Initialization Perspective
Guodong Fan, Zishu Yao, Guang-Yong Chen et al.
Vision-guided Text Mining for Unsupervised Cross-modal Hashing with Community Similarity Quantization
Haozhi Fan, Yuan Cao
EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs
Zhen Fan, Peng Dai, Zhuo Su et al.
CoSDA: Enhancing the Robustness of Inversion-based Generative Image Watermarking Framework
Han Fang, Kejiang Chen, Zijin Yang et al.
SSUN-Net: Spatial-Spectral Prior-Aware Unfolding Network for Pan-Sharpening
Shijie Fang, Hongping Gan
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes
Chaoran Feng, Wangbo Yu, Xinhua Cheng et al.
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng, Yang Bai, Tao Luo et al.
Weakly Supervised Gland Segmentation with Class Semantic Consistency and Purified Labels Filtration
Siyang Feng, Huadeng Wang, Chu Han et al.
HDLayout: Hierarchical and Directional Layout Planning for Arbitrary Shaped Visual Text Generation
Tonghui Feng, Chunsheng Yan, Qianru Wang et al.
Simplifying Control Mechanism in Text-to-Image Diffusion Models
Zhida Feng, Li Chen, Yuenan Sun et al.
BGHR: Bridging the Gap Between HBox-Supervised and RBox-Supervised Oriented Object Detection via Adaptive Fine-Grained Sample Mining
Chenlin Fu, Yingying Zhu
Foundation Model Driven Appearance Extraction for Robust Multiple Object Tracking
Teng Fu, Haiyang Yu, Ke Niu et al.
MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark
Keke Gai, Dongjue Wang, Jing Yu et al.
DFDNet: Disentangling and Filtering Dynamics for Enhanced Video Prediction
Lianqiang Gan, Junyu Lai, Jingze Ju et al.
PNVC: Towards Practical INR-based Video Compression
Ge Gao, Ho Man Kwan, Fan Zhang et al.
AIM: Let Any Multimodal Large Language Models Embrace Efficient In-Context Learning
Jun Gao, Qian Qiao, Tianxiang Wu et al.
TC-LLaVA: Rethinking the Transfer of LLava from Image to Video Understanding with Temporal Considerations
Mingze Gao, Jingyu Liu, Mingda Li et al.
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
Chengjie Ge, Xueyang Fu, Peng He et al.
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning
Shiping Ge, Qiang Chen, Zhiwei Jiang et al.
ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis
Xinyu Geng, Jiaming Wang, Xiaolin Huang et al.
MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang et al.
OT-StainNet: Optimal Transport Driven Semantic Matching for Weakly Paired H&E-to-IHC Stain Transfer
Xianchao Guan, Yifeng Wang, Ye Zhang et al.
You Should Learn to Stop Denoising on Point Clouds in Advance
Chuchen Guo, Weijie Zhou, Zheng Liu et al.
Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver
Diandian Guo, Weixin Si, Zhixi Li et al.
Enhancing Low-Rank Adaptation with Recoverability-Based Reinforcement Pruning for Object Counting
Haojie Guo, Junyu Gao, Yuan Yuan
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
Jialong Guo, Ke Liu, Jiangchao Yao et al.
PromptDet: A Lightweight 3D Object Detection Framework with LiDAR Prompts
Kun Guo, Qiang Ling
OpenVIS: Open-vocabulary Video Instance Segmentation
Pinxue Guo, Hao Huang, Peiyang He et al.
SpikeGS: Reconstruct 3D Scene Captured by a Fast-Moving Bio-Inspired Camera
Yijia Guo, Liwen Hu, Yuanxi Bai et al.
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Yongxin Guo, Jingyu Liu, Mingda Li et al.
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies
Ameer Hamza, Abdullah, Yong Hyun Ahn et al.
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
Wencheng Han, Dongqian Guo, Cheng-Zhong Xu et al.
ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image
Jinkun Hao, Junshu Tang, Jiangning Zhang et al.
Efficient Online Training for Zero-Shot Time-Lapse Microscopy Denoising and Super-Resolution
Ruian He, Ri Cheng, Xinkai Lyu et al.
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Xu He, Zhiyong Wu, Xiaoyu Li et al.
Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail
Yina He, Lei Peng, Yongcun Zhang et al.
FashionTailor: Controllable Clothing Editing for Human Images with Appearance Preserving
Jie Hou, Jianghong Ma, Xiangyu Mu et al.
Prompt Tuning In a Compact Attribute Space
Shiyu Hou, Tianfei Zhou, Shuai Zhang et al.
BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation
Xiaolu Hou, Mingcheng Li, Dingkang Yang et al.
Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References
Teng-Fang Hsiao, Bo-Kai Ruan, Hong-Han Shuai
GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution
Jintong Hu, Bin Xia, Bin Chen et al.
VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression
Qiang Hu, Houqiang Zhong, Zihan Zheng et al.
Identity-Text Video Corpus Grounding
Bin Huang, Xin Wang, Hong Chen et al.
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
Binyuan Huang, Yuqing Wen, Yucheng Zhao et al.
Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening
Jie Huang, Rui Huang, Jinghao Xu et al.
AUTE: Peer-Alignment and Self-Unlearning Boost Adversarial Robustness for Training Ensemble Models
Lifeng Huang, Tian Su, Chengying Gao et al.
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Muye Huang, Han Lai, Xinyu Zhang et al.
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang, Siming Fu, Jinlong Liu et al.
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Shaofei Huang, Rui Ling, Hongyu Li et al.
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors
Tianyu Huang, Haoze Zhang, Yihan Zeng et al.
Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
Wenbo Huang, Jinghui Zhang, Guang Li et al.
CLIP-RestoreX: Restore Image Structure and Perception in Exposure Correction
Xiang Huang, Qing Zhang, Jian-Fang Hu et al.
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration
Xiaoshui Huang, Zhou Huang, Yifan Zuo et al.
Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Xijie Huang, Xinyuan Wang, Hantao Zhang et al.
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection
Xun Huang, Ziyu Xu, Hai Wu et al.
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Yongle Huang, Haodong Chen, Zhenbang Xu et al.
PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model
Yunlong Huang, Junshuo Liu, Ke Xian et al.
EGSRAL:An Enhanced 3D Gaussian Splatting Based Renderer with Automated Labeling for Large-Scale Driving Scene
Yixiong Huo, Guangfeng Jiang, Hongyang Wei et al.
High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion
Junhwa Hur, Charles Herrmann, Saurabh Saxena et al.
Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
Lee Hyoseok, Kyeong Seon Kim, Kwon Byung-Ki et al.
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
Muhammet Furkan Ilaslan, Ali Köksal, Kevin Qinghong Lin et al.
Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks
Alexander Jaus, Constantin Marc Seibold, Simon Reiß et al.
Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection
Mingda Jia, Liming Zhao, Ge Li et al.
FlexiTex: Enhancing Texture Generation via Visual Guidance
Dadong Jiang, Xianghui Yang, Zibo Zhao et al.
ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling
Jianan Jiang, Hao Tang, Zhilin Jiang et al.
SCCS: Deep Neural Spectral Clustering for Self-Supervised Subcellular Structure Segmentation
Jimao Jiang, Diya Sun, Tianbing Wang et al.
Restabilizing Diffusion Models with Predictive Noise Fusion Strategy for Image Super-Resolution
Luoqian Jiang, Yong Guo, Bingna Xu et al.
Query Quantized Neural SLAM
Sijia Jiang, Jing Hua, Zhizhong Han
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin, Tianjin Huang, Yihua Zhang et al.
Pedestrian Attribute Recognition: A New Benchmark Dataset and a Large Language Model Augmented Framework
Jiandong Jin, Xiao Wang, Qian Zhu et al.
A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin, Han Nong, Liangming Chen et al.
Bridging the Semantic Granularity Gap Between Text and Frame Representations for Partially Relevant Video Retrieval
WooJin Jun, WonJun Moon, Cheol-Ho Cho et al.
CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis
Gyeongjin Kang, Younggeun Lee, Seungjun Oh et al.
DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension
Jingcheng Ke, Waikeung Wong, Jia Wang et al.
PLATYPUS: Progressive Local Surface Estimator for Arbitrary-Scale Point Cloud Upsampling
Donghyun Kim, Hyeonkyeong Kwon, Yumin Kim et al.
Generalized Zero-Shot Learning for Point Cloud Segmentation with Evidence-Based Dynamic Calibration
Hyeonseok Kim, Byeongkeun Kang, Yeejin Lee
APR-RD: Complemental Two Steps for Self-Supervised Real Image Denoising
Hyunjun Kim, Nam Ik Cho
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim, Jungbin Cho, Joonho Park et al.
ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder
Jungho Kim, Changwon Kang, Dongyoung Lee et al.
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
Seyeon Kim, Siyoon Jin, Jihye Park et al.
TSDF-Based Efficient Motion-Compensated Temporal Interpolation for 3D Dynamic Sequences
Soowoong Kim, Minseong Kwon, Junho Choi et al.
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
Taewhan Kim, Soeun Lee, Si-Woo Kim et al.
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
Hyun-kyu Ko, Dongheok Park, Youngin Park et al.
UniDet3D: Multi-dataset Indoor 3D Object Detection
Maksim Kolodiazhnyi, Anna Vorontsova, Matvey Skripkin et al.