Xin Jin

24
Papers
215
Total Citations

Papers (24)

Language-Image Pre-training with Long Captions

ECCV 2024
63
citations

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification

AAAI 2024arXiv
33
citations

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

ECCV 2024arXiv
31
citations

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

CVPR 2025
22
citations

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

CVPR 2025
15
citations

Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback

ECCV 2024
12
citations

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

ECCV 2024
12
citations

One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception

AAAI 2024arXiv
9
citations

DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts

AAAI 2025
6
citations

Towards RAW Object Detection in Diverse Conditions

CVPR 2025
5
citations

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

ICCV 2025
4
citations

Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable

CVPR 2025
1
citations

ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning

ICCV 2025
1
citations

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

ICCV 2025arXiv
1
citations

UniScene: Unified Occupancy-centric Driving Scene Generation

CVPR 2025
0
citations

Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models

AAAI 2025
0
citations

GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer

ICCV 2025
0
citations

UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

CVPR 2025
0
citations

SwiftPillars: High-Efficiency Pillar Encoder for Lidar-Based 3D Detection

AAAI 2024
0
citations

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

ICCV 2025
0
citations

Diff-BGM: A Diffusion Model for Video Background Music Generation

CVPR 2024
0
citations

Inter-X: Towards Versatile Human-Human Interaction Analysis

CVPR 2024
0
citations

ReGenNet: Towards Human Action-Reaction Synthesis

CVPR 2024
0
citations

StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization

ICML 2024
0
citations