Yansong Tang

49
Papers
307
Total Citations

Papers (49)

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

ECCV 2024
117
citations

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

CVPR 2024
44
citations

FlowIE: Efficient Image Enhancement via Rectified Flow

CVPR 2024
31
citations

Universal Segmentation at Arbitrary Granularity with Language Instruction

CVPR 2024
30
citations

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

ICLR 2025
17
citations

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

CVPR 2024
16
citations

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

ICCV 2025
11
citations

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

CVPR 2024
9
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

CVPR 2025arXiv
9
citations

Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation

ECCV 2024
8
citations

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

CVPR 2025
7
citations

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

ECCV 2024
4
citations

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing

CVPR 2025
2
citations

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

ICCV 2025arXiv
2
citations

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

CVPR 2024
0
citations

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition

CVPR 2018
0
citations

COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis

CVPR 2019
0
citations

Uncertainty-Aware Score Distribution Learning for Action Quality Assessment

CVPR 2020arXiv
0
citations

BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion

CVPR 2022
0
citations

Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning

CVPR 2022
0
citations

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

CVPR 2022arXiv
0
citations

DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting

CVPR 2022arXiv
0
citations

VoCo-LLaMA: Towards Vision Compression with Large Language Models

CVPR 2025
0
citations

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment

CVPR 2023
0
citations

FLAG3D: A 3D Fitness Activity Dataset With Language Instruction

CVPR 2023arXiv
0
citations

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

ICCV 2023arXiv
0
citations

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

ICCV 2023
0
citations

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

ICCV 2023arXiv
0
citations

Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer

ICCV 2023
0
citations

ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

ECCV 2022
0
citations

Global Spectral Filter Memory Network for Video Object Segmentation

ECCV 2022
0
citations

YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset

CVPR 2022
0
citations

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

CVPR 2025
0
citations

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion

ICCV 2025
0
citations

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

ICCV 2025
0
citations

KV-Edit: Training-Free Image Editing for Precise Background Preservation

ICCV 2025
0
citations

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation

ICCV 2025
0
citations

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

ICCV 2025
0
citations

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis

AAAI 2025
0
citations

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding

AAAI 2024
0
citations

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding

AAAI 2024
0
citations

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

CVPR 2024
0
citations

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

CVPR 2024
0
citations

Towards Accurate Post-training Quantization for Diffusion Models

CVPR 2024
0
citations

Segment and Caption Anything

CVPR 2024
0
citations

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

NeurIPS 2022
0
citations

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

NeurIPS 2022
0
citations

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory

NeurIPS 2023
0
citations

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

NeurIPS 2023
0
citations