Si Liu

66

Papers

323

Total Citations

Papers (66)

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Controllable Navigation Instruction Generation with Chain of Thought Prompting

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

EASE-DETR: Easing the Competition among Object Queries

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

Structural Sparse Tracking

Diversity-Induced Multi-View Subspace Clustering

SketchNet: Sketch Classification With Web Images

Structural Correlation Filter for Robust Visual Tracking

Surveillance Video Parsing With Single Frame Supervision

Learning Adaptive Receptive Fields for Deep Image Parsing Network

Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling

PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection

AdversarialNAS: Adversarial Neural Architecture Search for GANs

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Referring Image Segmentation via Cross-Modal Progressive Comprehension

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

Reformulating HOI Detection As Adaptive Set Prediction

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

General Instance Distillation for Object Detection

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

Reinforced Structured State-Evolution for Vision-Language Navigation

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

Boosting Verified Training for Robust Image Classifications via Abstraction

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Bridging Search Region Interaction With Template for RGB-T Tracking

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels

DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object Detection

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

Human Parsing With Contextualized Convolutional Neural Network

Low-Rank Tensor Constrained Multiview Subspace Clustering

RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism

Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation

Video Background Music Generation: Dataset, Method and Evaluation

Object as Query: Lifting Any 2D Object Detector to 3D Detection

Optimizing the Placement of Roadside LiDARs for Autonomous Driving

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection

Generative Map Priors for Collaborative BEV Semantic Segmentation

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

Mining the Benefits of Two-stage and One-stage HOI Detection

Boosting Verification of Deep Reinforcement Learning via Piece-Wise Linear Decision Neural Networks

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Open Category Detection with PAC Guarantees