Shuangrui Ding

13

Papers

106

Total Citations

Papers (13)

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

Keyframe-Guided Creative Video Inpainting

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Static and Dynamic Concepts for Self-Supervised Video Representation Learning

AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging

Enhancing Self-Supervised Video Representation Learning via Multi-Level Feature Optimization

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

Towards More Practical Adversarial Attacks on Graph Neural Networks