Xu Yang

40

Papers

425

Total Citations

Papers (40)

Learning Progressive Joint Propagation for Human Motion Prediction

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

How to Configure Good In-Context Sequence for Visual Question Answering

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

MemoNav: Working Memory Model for Visual Navigation

Mimic In-Context Learning for Multimodal Tasks

Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation

Building Variable-Sized Models via Learngene Pool

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Vision Transformers as Probabilistic Expansion from Learngene

One Meta-tuned Transformer is What You Need for Few-shot Learning

Auto-Encoding Scene Graphs for Image Captioning

Multi-Scale Fusion Subspace Clustering Using Similarity Constraint

SelfSAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network

Nearest Neighbor Matching for Deep Clustering

Causal Attention for Vision-Language Tasks

Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency

Show, Deconfound and Tell: Image Captioning With Causal Inference

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Learning to Collocate Neural Modules for Image Captioning

Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection

Unpaired Image Captioning via Scene Graph Alignments

Auto-Parsing Network for Image Captioning and Visual Question Answering

Learning Trajectory-Word Alignments for Video-Language Tasks

Deep Spectral Clustering Using Dual Autoencoder Network

Number it: Temporal Grounding Videos like Flipping Manga

Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

Democratizing High-Fidelity Co-Speech Gesture Video Generation

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Inheriting Generalized Learngene for Efficient Knowledge Transfer across Multiple Tasks

Transformer as Linear Expansion of Learngene

A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

Adversarial Learning for Robust Deep Clustering

Exploring Diverse In-Context Configurations for Image Captioning

Learning From Biased Soft Labels