Jing Shao

32
Papers
947
Total Citations

Papers (32)

WorldSimBench: Towards Video Generation Models as World Simulators

ICML 2025
806
citations

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

CVPR 2024
76
citations

REEF: Representation Encoding Fingerprints for Large Language Models

ICLR 2025
31
citations

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

CVPR 2025
25
citations

EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

NeurIPS 2025arXiv
9
citations

Exploring Disentangled Feature Representation Beyond Face Identification

CVPR 2018arXiv
0
citations

Practical Block-Wise Neural Network Architecture Generation

CVPR 2018arXiv
0
citations

Avatar-Net: Multi-Scale Zero-Shot Style Transfer by Feature Decoration

CVPR 2018arXiv
0
citations

Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing

CVPR 2019
0
citations

Semantics Disentangling for Text-To-Image Generation

CVPR 2019
0
citations

Video Generation From Single Semantic Label Map

CVPR 2019
0
citations

Context and Attribute Grounded Dense Captioning

CVPR 2019
0
citations

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

CVPR 2021arXiv
0
citations

Siamese DETR

CVPR 2023arXiv
0
citations

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

ICCV 2017
0
citations

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification

ICCV 2017
0
citations

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

ICCV 2019
0
citations

CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations

ECCV 2020
0
citations

Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues

ECCV 2020
0
citations

Learning Connectivity of Neural Networks from a Topological Perspective

ECCV 2020
0
citations

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

ECCV 2022
0
citations

Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies

ECCV 2022
0
citations

PalGAN: Image Colorization with Palette Generative Adversarial Networks

ECCV 2022
0
citations

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

ECCV 2022
0
citations

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

CVPR 2021arXiv
0
citations

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models

CVPR 2025
0
citations

Deeply Learned Attributes for Crowded Scene Understanding

CVPR 2015
0
citations

Slicing Convolutional Neural Network for Crowd Video Understanding

CVPR 2016
0
citations

Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion

CVPR 2017
0
citations

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

NeurIPS 2019
0
citations

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

NeurIPS 2022
0
citations

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

NeurIPS 2023
0
citations