Xiaoqin Zhang

16

Papers

28

Total Citations

Papers (16)

VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

Weakly Supervised Monocular 3D Detection with a Single-View Image

PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Spatial Preference Rewarding for MLLMs Spatial Understanding

Face Retouching with Diffusion Data Generation and Spectral Restorement

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking

PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency

SGFormer: Semantic-Geometry Fusion Transformer for Multi-modal 3D Panoptic Segmentation

Masked AutoDecoder is Effective Multi-Task Vision Generalist

FAC: 3D Representation Learning via Foreground Aware Feature Contrast

DA-DETR: Domain Adaptive Detection Transformer With Information Fusion

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration

Pose-Free Neural Radiance Fields via Implicit Pose Regularization

WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields