Xin Jin
24
Papers
215
Total Citations
Papers (24)
Language-Image Pre-training with Long Captions
ECCV 2024
63
citations
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
AAAI 2024arXiv
33
citations
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
ECCV 2024arXiv
31
citations
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
CVPR 2025
22
citations
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
CVPR 2025
15
citations
Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
ECCV 2024
12
citations
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
ECCV 2024
12
citations
One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception
AAAI 2024arXiv
9
citations
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts
AAAI 2025
6
citations
Towards RAW Object Detection in Diverse Conditions
CVPR 2025
5
citations
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
ICCV 2025
4
citations
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
CVPR 2025
1
citations
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
ICCV 2025
1
citations
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
ICCV 2025arXiv
1
citations
UniScene: Unified Occupancy-centric Driving Scene Generation
CVPR 2025
0
citations
Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models
AAAI 2025
0
citations
GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer
ICCV 2025
0
citations
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
CVPR 2025
0
citations
SwiftPillars: High-Efficiency Pillar Encoder for Lidar-Based 3D Detection
AAAI 2024
0
citations
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
ICCV 2025
0
citations
Diff-BGM: A Diffusion Model for Video Background Music Generation
CVPR 2024
0
citations
Inter-X: Towards Versatile Human-Human Interaction Analysis
CVPR 2024
0
citations
ReGenNet: Towards Human Action-Reaction Synthesis
CVPR 2024
0
citations
StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization
ICML 2024
0
citations