Yan Huang

39

Papers

47

Total Citations

Papers (39)

HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Zero-Shot Low-Light Image Enhancement via Latent Diffusion Models

Free Lunch for Gait Recognition: A Novel Relation Descriptor

Open-Vocabulary Octree-Graph for 3D Scene Understanding

Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition

Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

Investigating Compositional Challenges in Vision-Language Models for Visual Grounding

Sparse Coding for Classification via Discrimination Ensemble

Instance-Aware Image and Sentence Matching With Selective Multimodal LSTM

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-Identification

Mask-Guided Contrastive Attention Model for Person Re-Identification

Aligning Infinite-Dimensional Covariance Matrices in Reproducing Kernel Hilbert Spaces for Domain Adaptation

M3: Multimodal Memory Modelling for Video Captioning

Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model

Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection

Rethinking the Heatmap Regression for Bottom-Up Human Pose Estimation

Dynamic Texture Recognition via Orthogonal Tensor Dictionary Learning

Conditional High-Order Boltzmann Machine: A Supervised Learning Model for Relation Learning

ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching

SBSGAN: Suppression of Inter-Domain Background Shift for Person Re-Identification

Clothing Status Awareness for Long-Term Person Re-Identification

PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking

Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach

Prediction and Recovery for Adaptive Low-Resolution Person Re-Identification

Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution

Learning Semantic Concepts and Order for Image and Sentence Matching

PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization

DATA: Domain-And-Time Alignment for High-Quality Feature Fusion in Collaborative Perception

TDeLTA: A Light-Weight and Robust Table Detection Method Based on Learning Text Arrangement

Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition

Context-Guided Spatio-Temporal Video Grounding

Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability

RetGK: Graph Kernels based on Return Probabilities of Random Walks

Unfolding the Alternating Optimization for Blind Super Resolution

Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision

MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching

Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation