Jing Shao
32
Papers
947
Total Citations
Papers (32)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
806
citations
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR 2024
76
citations
REEF: Representation Encoding Fingerprints for Large Language Models
ICLR 2025
31
citations
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025
25
citations
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
NeurIPS 2025arXiv
9
citations
Exploring Disentangled Feature Representation Beyond Face Identification
CVPR 2018arXiv
0
citations
Practical Block-Wise Neural Network Architecture Generation
CVPR 2018arXiv
0
citations
Avatar-Net: Multi-Scale Zero-Shot Style Transfer by Feature Decoration
CVPR 2018arXiv
0
citations
Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing
CVPR 2019
0
citations
Semantics Disentangling for Text-To-Image Generation
CVPR 2019
0
citations
Video Generation From Single Semantic Label Map
CVPR 2019
0
citations
Context and Attribute Grounded Dense Captioning
CVPR 2019
0
citations
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
CVPR 2021arXiv
0
citations
Siamese DETR
CVPR 2023arXiv
0
citations
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
ICCV 2017
0
citations
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification
ICCV 2017
0
citations
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
ICCV 2019
0
citations
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations
ECCV 2020
0
citations
Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues
ECCV 2020
0
citations
Learning Connectivity of Neural Networks from a Topological Perspective
ECCV 2020
0
citations
Benchmarking Omni-Vision Representation through the Lens of Visual Realms
ECCV 2022
0
citations
Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies
ECCV 2022
0
citations
PalGAN: Image Colorization with Palette Generative Adversarial Networks
ECCV 2022
0
citations
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
ECCV 2022
0
citations
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
CVPR 2021arXiv
0
citations
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
CVPR 2025
0
citations
Deeply Learned Attributes for Crowded Scene Understanding
CVPR 2015
0
citations
Slicing Convolutional Neural Network for Crowd Video Understanding
CVPR 2016
0
citations
Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion
CVPR 2017
0
citations
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
NeurIPS 2019
0
citations
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
NeurIPS 2022
0
citations
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS 2023
0
citations