Jing Zhang

43

Papers

1,796

Total Citations

2

Affiliations

Affiliations

Hefei University of TechnologyGent University-imec

Papers (43)

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

Question Calibration and Multi-Hop Modeling for Temporal Question Answering

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

Decomposing Semantic Shifts for Composed Image Retrieval

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Probability Density Geodesics in Image Diffusion Latent Space

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

Adversarial Exploitation of Data Diversity Improves Visual Localization

Patch-level Sounding Object Tracking for Audio-Visual Question Answering

Multi-axis Prompt and Multi-dimension Fusion Network for All-in-one Weather-degraded Image Restoration

UAWTrack: Universal 3D Single Object Tracking in Adverse Weather

Semi-supervised Infrared Small Target Detection with Thermodynamic-Inspired Uneven Perturbation and Confidence Adaptation

MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection

Highly Imperceptible Black-Box Graph Injection Attacks with Reinforcement Learning

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

Synergistic Prompting for Robust Visual Recognition with Missing Modalities

GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

Multi-Modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Data-Free Generalized Zero-Shot Learning

Adversarial Purification with the Manifold Hypothesis

Quantum-Inspired Neural Network with Runge-Kutta Method

LaViP: Language-Grounded Visual Prompting

Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection

Empowering LLMs to Understand and Generate Complex Vector Graphics

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining

SVGDreamer: Text Guided SVG Generation with Diffusion Model

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Rethink Sparse Signals for Pose-guided Text-to-image Generation