Junshu Sun

4

Papers

9

Total Citations

Papers (4)

Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval

Video Language Model Pretraining with Spatio-temporal Masking

VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

NeurIPS 2025arXiv

Relieving the Over-Aggregating Effect in Graph Transformers