Lu Sheng
35
Papers
997
Total Citations
Papers (35)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
806
citations
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR 2024
76
citations
MV-Adapter: Multi-View Consistent Image Generation Made Easy
ICCV 2025
69
citations
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025
25
citations
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
CVPR 2025
21
citations
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
CVPR 2024
0
citations
A Generative Model for Depth-Based Robust 3D Facial Pose Tracking
CVPR 2017
0
citations
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
CVPR 2018arXiv
0
citations
Exploring Disentangled Feature Representation Beyond Face Identification
CVPR 2018arXiv
0
citations
Avatar-Net: Multi-Scale Zero-Shot Style Transfer by Feature Decoration
CVPR 2018arXiv
0
citations
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
CVPR 2019
0
citations
Semantics Disentangling for Text-To-Image Generation
CVPR 2019
0
citations
Video Generation From Single Semantic Label Map
CVPR 2019
0
citations
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
CVPR 2021arXiv
0
citations
Back-Tracing Representative Points for Voting-Based 3D Object Detection in Point Clouds
CVPR 2021arXiv
0
citations
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
CVPR 2022
0
citations
Siamese DETR
CVPR 2023arXiv
0
citations
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
CVPR 2023
0
citations
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
ICCV 2017
0
citations
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM
ICCV 2019
0
citations
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
ICCV 2019
0
citations
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
ICCV 2019
0
citations
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
ICCV 2021
0
citations
StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition
ICCV 2021
0
citations
Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues
ECCV 2020
0
citations
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy
ECCV 2020
0
citations
SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling
ECCV 2022
0
citations
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
ECCV 2022
0
citations
Improving RGB-D Point Cloud Registration by Learning Multi-Scale Local Linear Transformation
ECCV 2022
0
citations
Context and Attribute Grounded Dense Captioning
CVPR 2019
0
citations
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
CVPR 2025
0
citations
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
CVPR 2025
0
citations
Multi-Modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation
AAAI 2024
0
citations
Data-Free Generalized Zero-Shot Learning
AAAI 2024arXiv
0
citations
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS 2023
0
citations