Jun Zhang

29

Papers

322

Total Citations

1

Affiliations

Affiliations

Zhejiang University

Papers (29)

Generalized Predictive Model for Autonomous Driving

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

Task-Aware Encoder Control for Deep Video Compression

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Learn How to Query from Unlabeled Data Streams in Federated Learning

Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification

Generalized Relation Modeling for Transformer Tracking

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query

Attentional Pyramid Pooling of Salient Visual Residuals for Place Recognition

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

On the Convergence of an Adaptive Momentum Method for Adversarial Attacks

TransLoc4D: Transformer-based 4D Radar Place Recognition

Boosting Neural Representations for Videos with a Conditional Decoder

Training-Free Long-Context Scaling of Large Language Models

DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing

Multi-dataset Training of Transformers for Robust Action Recognition

SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval