Hao Luo

21

Papers

117

Total Citations

Papers (21)

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Reinforcement Learning Friendly Vision-Language Model for Minecraft

Unified Multimodal Understanding via Byte-Pair Visual Encoding

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

PlayerOne: Egocentric World Simulator

Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration

Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Accelerating Parallel Sampling of Diffusion Models

Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

TransReID: Transformer-Based Object Re-Identification

Revisiting Vision Transformer from the View of Path Ensemble

Unstructured Feature Decoupling for Vehicle Re-identification

BVT-IMA: Binary Vision Transformer with Information-Modified Attention

AnyI2V: Animating Any Conditional Image with Motion Control

Preacher: Paper-to-Video Agentic System

VideoOrion: Tokenizing Object Dynamics in Videos

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

VTC-LFC: Vision Transformer Compression with Low-Frequency Components