Songyang Zhang

22
Papers
33
Total Citations

Papers (22)

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

AAAI 2025
26
citations

Rethinking Verification for LLM Code Generation: From Generation to Testing

NeurIPS 2025
7
citations

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

CVPR 2024
0
citations

FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

ICML 2024
0
citations

Predicting Salient Face in Multiple-Face Videos

CVPR 2017
0
citations

Distribution Alignment: A Unified Framework for Long-Tail Visual Recognition

CVPR 2021arXiv
0
citations

Bipartite Graph Network With Adaptive Message Passing for Unbiased Scene Graph Generation

CVPR 2021arXiv
0
citations

The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

CVPR 2022
0
citations

SGTR: End-to-End Scene Graph Generation With Transformer

CVPR 2022arXiv
0
citations

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

CVPR 2023
0
citations

Dynamic Context Correspondence Network for Semantic Alignment

ICCV 2019
0
citations

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

ICCV 2021arXiv
0
citations

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

ICCV 2023arXiv
0
citations

Part-aware Prototype Network for Few-shot Semantic Segmentation

ECCV 2020
0
citations

Expanding Language-Image Pretrained Models for General Video Recognition

ECCV 2022
0
citations

Action Quality Assessment with Temporal Parsing Transformer

ECCV 2022
0
citations

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

ECCV 2022
0
citations

Learning Semantic Correspondence with Sparse Annotations

ECCV 2022
0
citations

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

ICCV 2025
0
citations

DualGFL: Federated Learning with a Dual-Level Coalition-Auction Game

AAAI 2025
0
citations

Dynamic Grained Encoder for Vision Transformers

NeurIPS 2021
0
citations

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

ICML 2019
0
citations