Xuming He

38

Papers

0

Total Citations

Papers (38)

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

Indoor Scene Structure Analysis for Single Image Depth Estimation

Multiclass Semantic Video Segmentation With Object-Level Active Inference

Separating Objects and Clutter in Indoor Scenes

Learning to Co-Generate Object Proposals With a Deep Structured Network

Predicting Salient Face in Multiple-Face Videos

Indoor Scene Parsing With Instance Segmentation, Semantic Labeling and Support Relationship Inference

Boundary-Aware Instance Segmentation

One-Shot Action Localization by Learning Sequence Matching Network

Geometry-Aware Deep Network for Single-Image Novel View Synthesis

SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text

Distribution Alignment: A Unified Framework for Long-Tail Visual Recognition

Bipartite Graph Network With Adaptive Message Passing for Unbiased Scene Graph Generation

DER: Dynamically Expandable Representation for Class Incremental Learning

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

General Incremental Learning With Domain-Aware Categorical Representations

SGTR: End-to-End Scene Graph Generation With Transformer

HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models

Structural Kernel Learning for Large Scale Multiclass Object Co-Detection

Deep Free-Form Deformation Network for Object-Mask Registration

Dynamic Context Correspondence Network for Semantic Alignment

Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection

GNeRF: GAN-Based Neural Radiance Field Without Posed Camera

Class-relation Knowledge Distillation for Novel Class Discovery

Grounded Image Text Matching with Mismatched Relation Reasoning

Human-centric Scene Understanding for 3D Large-scale Scenarios

Part-aware Prototype Network for Few-shot Semantic Segmentation

Learning Semantic Correspondence with Sparse Annotations

Generative Negative Text Replay for Continual Vision-Language Pretraining

Dynamic Grained Encoder for Vision Transformers

ATTA: Anomaly-aware Test-Time Adaptation for Out-of-Distribution Detection in Segmentation

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition