Si Liu

66
Papers
323
Total Citations

Papers (66)

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

CVPR 2015
168
citations

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

CVPR 2025
54
citations

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

ECCV 2024arXiv
51
citations

Mixture Compressor for Mixture-of-Experts LLMs Gains More

ICLR 2025
23
citations

Controllable Navigation Instruction Generation with Chain of Thought Prompting

ECCV 2024
16
citations

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

NeurIPS 2025
8
citations

FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering

CVPR 2025
2
citations

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

ICCV 2025
1
citations

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

CVPR 2024
0
citations

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

CVPR 2024
0
citations

EASE-DETR: Easing the Competition among Object Queries

CVPR 2024
0
citations

Communication-Efficient Collaborative Perception via Information Filling with Codebook

CVPR 2024
0
citations

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

CVPR 2024
0
citations

Structural Sparse Tracking

CVPR 2015
0
citations

Diversity-Induced Multi-View Subspace Clustering

CVPR 2015
0
citations

SketchNet: Sketch Classification With Web Images

CVPR 2016
0
citations

Structural Correlation Filter for Robust Visual Tracking

CVPR 2016
0
citations

Surveillance Video Parsing With Single Frame Supervision

CVPR 2017arXiv
0
citations

Learning Adaptive Receptive Fields for Deep Image Parsing Network

CVPR 2017
0
citations

Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling

CVPR 2019
0
citations

PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection

CVPR 2020arXiv
0
citations

AdversarialNAS: Adversarial Neural Architecture Search for GANs

CVPR 2020arXiv
0
citations

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

CVPR 2020arXiv
0
citations

Referring Image Segmentation via Cross-Modal Progressive Comprehension

CVPR 2020arXiv
0
citations

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

CVPR 2020arXiv
0
citations

Reformulating HOI Detection As Adaptive Set Prediction

CVPR 2021arXiv
0
citations

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

CVPR 2021
0
citations

General Instance Distillation for Object Detection

CVPR 2021arXiv
0
citations

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

CVPR 2021arXiv
0
citations

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

CVPR 2021arXiv
0
citations

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

CVPR 2022arXiv
0
citations

Reinforced Structured State-Evolution for Vision-Language Navigation

CVPR 2022arXiv
0
citations

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

CVPR 2022
0
citations

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

CVPR 2022
0
citations

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

CVPR 2022
0
citations

Boosting Verified Training for Robust Image Classifications via Abstraction

CVPR 2023arXiv
0
citations

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

CVPR 2023arXiv
0
citations

Bridging Search Region Interaction With Template for RGB-T Tracking

CVPR 2023
0
citations

Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation

CVPR 2023
0
citations

Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels

CVPR 2023arXiv
0
citations

DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object Detection

CVPR 2023arXiv
0
citations

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

CVPR 2025
0
citations

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

ICCV 2015
0
citations

Human Parsing With Contextualized Convolutional Neural Network

ICCV 2015
0
citations

Low-Rank Tensor Constrained Multiview Subspace Clustering

ICCV 2015
0
citations

RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

ICCV 2019
0
citations

Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism

ICCV 2021
0
citations

Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation

ICCV 2023arXiv
0
citations

Video Background Music Generation: Dataset, Method and Evaluation

ICCV 2023arXiv
0
citations

Object as Query: Lifting Any 2D Object Detector to 3D Detection

ICCV 2023arXiv
0
citations

Optimizing the Placement of Roadside LiDARs for Autonomous Driving

ICCV 2023
0
citations

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

ECCV 2020
0
citations

PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation

ECCV 2022
0
citations

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

ECCV 2022
0
citations

Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection

CVPR 2023arXiv
0
citations

Generative Map Priors for Collaborative BEV Semantic Segmentation

CVPR 2025
0
citations

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

CVPR 2025
0
citations

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

ICCV 2025
0
citations

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation

ICCV 2025
0
citations

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization

ICCV 2025
0
citations

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

AAAI 2025
0
citations

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

AAAI 2025
0
citations

Mining the Benefits of Two-stage and One-stage HOI Detection

NeurIPS 2021
0
citations

Boosting Verification of Deep Reinforcement Learning via Piece-Wise Linear Decision Neural Networks

NeurIPS 2023
0
citations

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

NeurIPS 2023
0
citations

Open Category Detection with PAC Guarantees

ICML 2018
0
citations