Yansong Tang
49
Papers
307
Total Citations
Papers (49)
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
ECCV 2024
117
citations
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
CVPR 2024
44
citations
FlowIE: Efficient Image Enhancement via Rectified Flow
CVPR 2024
31
citations
Universal Segmentation at Arbitrary Granularity with Language Instruction
CVPR 2024
30
citations
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
ICLR 2025
17
citations
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
CVPR 2024
16
citations
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
ICCV 2025
11
citations
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024
9
citations
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
CVPR 2025arXiv
9
citations
Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation
ECCV 2024
8
citations
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
CVPR 2025
7
citations
Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
ECCV 2024
4
citations
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
CVPR 2025
2
citations
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
ICCV 2025arXiv
2
citations
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
CVPR 2024
0
citations
Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition
CVPR 2018
0
citations
COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis
CVPR 2019
0
citations
Uncertainty-Aware Score Distribution Learning for Action Quality Assessment
CVPR 2020arXiv
0
citations
BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion
CVPR 2022
0
citations
Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
CVPR 2022
0
citations
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
CVPR 2022arXiv
0
citations
DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting
CVPR 2022arXiv
0
citations
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
0
citations
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
CVPR 2023
0
citations
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
CVPR 2023arXiv
0
citations
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
ICCV 2023arXiv
0
citations
Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
ICCV 2023
0
citations
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
ICCV 2023arXiv
0
citations
Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer
ICCV 2023
0
citations
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer
ECCV 2022
0
citations
Global Spectral Filter Memory Network for Video Object Segmentation
ECCV 2022
0
citations
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
CVPR 2022
0
citations
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
CVPR 2025
0
citations
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
ICCV 2025
0
citations
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
ICCV 2025
0
citations
KV-Edit: Training-Free Image Editing for Precise Background Preservation
ICCV 2025
0
citations
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
ICCV 2025
0
citations
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
ICCV 2025
0
citations
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
0
citations
CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding
AAAI 2024
0
citations
Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding
AAAI 2024
0
citations
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
CVPR 2024
0
citations
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
CVPR 2024
0
citations
Towards Accurate Post-training Quantization for Diffusion Models
CVPR 2024
0
citations
Segment and Caption Anything
CVPR 2024
0
citations
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
NeurIPS 2022
0
citations
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
NeurIPS 2022
0
citations
MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory
NeurIPS 2023
0
citations
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
NeurIPS 2023
0
citations