Jianbing Shen

71

Papers

109

Total Citations

Papers (71)

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving

RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

Language Prompt for Autonomous Driving

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Fine-Grained Distillation for Long Document Retrieval

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering

Saliency-Aware Geodesic Video Object Segmentation

Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning

Salient Object Detection Driven by Fixation Prediction

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

Revisiting Video Saliency: A Large-Scale Benchmark and a New Model

Striking the Right Balance With Uncertainty

Salient Object Detection With Pyramid Attention and Salient Edges

Learning Unsupervised Video Object Segmentation Through Visual Attention

See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks

An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection

Shifting More Attention to Video Salient Object Detection

Camouflaged Object Detection

Cascaded Human-Object Interaction Recognition

Self-Learning With Rectification Strategy for Human Parsing

Probabilistic Structural Latent Representation for Unsupervised Embedding

Hierarchical Human Parsing With Typed Part-Relation Reasoning

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

Learning Video Object Segmentation From Unlabeled Videos

NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection

Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation

LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

Structured Scene Memory for Vision-Language Navigation

Video Object Segmentation Using Global and Instance Embedding Learning

Face Forensics in the Wild

Learning To Fuse Asymmetric Feature Maps in Siamese Trackers

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation

A Graph Matching Perspective With Transformers on Video Instance Segmentation

Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection and Direction Consistency

Referring Multi-Object Tracking

Linearization to Nonlinear Learning for Visual Tracking

Super-Trajectory for Video Segmentation

Deep Cropping via Attention Box Prediction and Aesthetics Assessment

Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

Towards Bridging Semantic Gap to Improve Semantic Segmentation

Human-Aware Motion Deblurring

Learning Compositional Neural Information Fusion for Human Parsing

Gaussian Affinity for Max-Margin Class Imbalanced Learning

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation

Full-Duplex Strategy for Video Object Segmentation

Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Video Object Segmentation with Episodic Graph Memory Networks

Weakly Supervised 3D Object Detection from Lidar Point Cloud

Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers

Active Visual Information Gathering for Vision-Language Navigation

Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-identification

Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning

Learning Disentanglement with Decoupled Labels for Vision-Language Navigation

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation

Semi-Supervised 3D Object Detection with Proficient Teachers

LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning

ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models