Houqiang Li

83

Papers

153

Total Citations

Papers (83)

EG4D: Explicit Generation of 4D Object without Score Distillation

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Long-term Temporal Context Gathering for Neural Video Compression

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Revisiting Open-Set Panoptic Segmentation

KGDM: A Diffusion Model to Capture Multiple Relation Semantics for Knowledge Graph Embedding

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Generative Latent Coding for Ultra-Low Bitrate Image Compression

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition

SOM: Semantic Obviousness Metric for Image Quality Assessment

Comparative Deep Learning of Hybrid Representations for Image Recommendations

Jointly Modeling Embedding and Translation to Bridge Video and Language

Video Captioning With Transferred Semantic Attributes

Feature Selective Networks for Object Detection

Multi-Cue Correlation Filters for Robust Visual Tracking

Towards Open-Set Identity Preserving Face Synthesis

Unsupervised Deep Tracking

Iterative Alignment Network for Continuous Sign Language Recognition

Quantization Networks

M-LVC: Multiple Frames Prediction for Learned Video Compression

Transformation GAN for Unsupervised Image Synthesis and Representation Learning

Improving Sign Language Translation With Monolingual Data by Sign Back-Translation

Representing Videos As Discriminative Sub-Graphs for Action Recognition

Unsupervised Pre-Training for Person Re-Identification

Model-Aware Gesture-to-Gesture Translation

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE

Revisiting Knowledge Distillation: An Inheritance and Exploration Framework

Uformer: A General U-Shaped Transformer for Image Restoration

Contextual Similarity Distillation for Asymmetric Image Retrieval

Large-Scale Pre-Training for Person Re-Identification With Noisy Labels

Domain-Agnostic Prior for Transfer Semantic Segmentation

Asymmetric Feature Fusion for Image Retrieval

Human Pose As Compositional Tokens

Stare at What You See: Masked Image Modeling Without Reconstruction

AltFreezing for More General Video Face Forgery Detection

HandNeRF: Neural Radiance Fields for Animatable Interacting Hands

CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training

Relation Distillation Networks for Video Object Detection

Joint Inductive and Transductive Learning for Video Object Segmentation

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

Conditional DETR for Fast Training Convergence

Instance-Wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation

3D Local Convolutional Neural Networks for Gait Recognition

Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval

TransVG: End-to-End Visual Grounding With Transformers

Sign Language Translation with Iterative Prototype

DIRE for Diffusion-Generated Image Detection

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

Focus on Your Target: A Dual Teacher-Student Framework for Domain-Adaptive Semantic Segmentation

Masked Motion Predictors are Strong 3D Action Representation Learners

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection

CMD: Self-Supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation

TAPE: Task-Agnostic Prior Embedding for Image Restoration

CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds

MVP: Multimodality-Guided Visual Pre-training

Geometric Representation Learning for Document Image Rectification

Motion Information Propagation for Neural Video Compression

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Towards Practical Real-Time Neural Video Compression

OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation

Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments

S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Dual Progressive Prototype Network for Generalized Zero-Shot Learning

Contextual Similarity Aggregation with Self-attention for Visual Re-ranking

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Hand-Object Interaction Image Generation

Multi-Agent First Order Constrained Optimization in Policy Space

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection

Hierarchical Multi-Agent Skill Discovery

State Sequences Prediction via Fourier Transform for Representation Learning

DIFFER:Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning