Songyang Zhang
22
Papers
33
Total Citations
Papers (22)
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
AAAI 2025
26
citations
Rethinking Verification for LLM Code Generation: From Generation to Testing
NeurIPS 2025
7
citations
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
CVPR 2024
0
citations
FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
ICML 2024
0
citations
Predicting Salient Face in Multiple-Face Videos
CVPR 2017
0
citations
Distribution Alignment: A Unified Framework for Long-Tail Visual Recognition
CVPR 2021arXiv
0
citations
Bipartite Graph Network With Adaptive Message Passing for Unbiased Scene Graph Generation
CVPR 2021arXiv
0
citations
The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
CVPR 2022
0
citations
SGTR: End-to-End Scene Graph Generation With Transformer
CVPR 2022arXiv
0
citations
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
0
citations
Dynamic Context Correspondence Network for Semantic Alignment
ICCV 2019
0
citations
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
ICCV 2021arXiv
0
citations
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
ICCV 2023arXiv
0
citations
Part-aware Prototype Network for Few-shot Semantic Segmentation
ECCV 2020
0
citations
Expanding Language-Image Pretrained Models for General Video Recognition
ECCV 2022
0
citations
Action Quality Assessment with Temporal Parsing Transformer
ECCV 2022
0
citations
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
ECCV 2022
0
citations
Learning Semantic Correspondence with Sparse Annotations
ECCV 2022
0
citations
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
0
citations
DualGFL: Federated Learning with a Dual-Level Coalition-Auction Game
AAAI 2025
0
citations
Dynamic Grained Encoder for Vision Transformers
NeurIPS 2021
0
citations
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
ICML 2019
0
citations