Lu Sheng

35
Papers
997
Total Citations

Papers (35)

WorldSimBench: Towards Video Generation Models as World Simulators

ICML 2025
806
citations

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

CVPR 2024
76
citations

MV-Adapter: Multi-View Consistent Image Generation Made Easy

ICCV 2025
69
citations

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

CVPR 2025
25
citations

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

CVPR 2025
21
citations

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

CVPR 2024
0
citations

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

CVPR 2017
0
citations

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

CVPR 2018arXiv
0
citations

Exploring Disentangled Feature Representation Beyond Face Identification

CVPR 2018arXiv
0
citations

Avatar-Net: Multi-Scale Zero-Shot Style Transfer by Feature Decoration

CVPR 2018arXiv
0
citations

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

CVPR 2019
0
citations

Semantics Disentangling for Text-To-Image Generation

CVPR 2019
0
citations

Video Generation From Single Semantic Label Map

CVPR 2019
0
citations

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

CVPR 2021arXiv
0
citations

Back-Tracing Representative Points for Voting-Based 3D Object Detection in Point Clouds

CVPR 2021arXiv
0
citations

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

CVPR 2022
0
citations

Siamese DETR

CVPR 2023arXiv
0
citations

VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

CVPR 2023
0
citations

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

ICCV 2017
0
citations

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM

ICCV 2019
0
citations

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

ICCV 2019
0
citations

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

ICCV 2019
0
citations

3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

ICCV 2021
0
citations

StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition

ICCV 2021
0
citations

Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues

ECCV 2020
0
citations

Powering One-shot Topological NAS with Stabilized Share-parameter Proxy

ECCV 2020
0
citations

SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling

ECCV 2022
0
citations

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

ECCV 2022
0
citations

Improving RGB-D Point Cloud Registration by Learning Multi-Scale Local Linear Transformation

ECCV 2022
0
citations

Context and Attribute Grounded Dense Captioning

CVPR 2019
0
citations

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

CVPR 2025
0
citations

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

CVPR 2025
0
citations

Multi-Modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

AAAI 2024
0
citations

Data-Free Generalized Zero-Shot Learning

AAAI 2024arXiv
0
citations

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

NeurIPS 2023
0
citations