Yu Liu

105

Papers

636

Total Citations

Papers (105)

Combinatorial Multi-Armed Bandit with General Reward Functions

NeurIPS 2016arXiv

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Space Group Constrained Crystal Generation

Learning Where to Focus for Efficient Video Object Detection

Universal Actions for Enhanced Embodied Foundation Models

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Lipschitz Singularities in Diffusion Models

Improved Video VAE for Latent Video Diffusion Model

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting

Unsupervised Sequence Classification using Sequential Output Statistics

NeurIPS 2017arXiv

IDEA-Bench: How Far are Generative Models from Professional Designing?

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics

AUC Optimization from Multiple Unlabeled Datasets

See Further When Clear: Curriculum Consistency Model

Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis

CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Learning Relaxed Deep Supervision for Better Edge Detection

Quality Aware Network for Set to Set Recognition

Scale-Aware Face Detection

Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning

MoNet: Deep Motion Exploitation for Video Object Segmentation

Exploring Disentangled Feature Representation Beyond Face Identification

Beyond Trade-Off: Accelerate FCN-Based Face Detector With Higher Accuracy

RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

Conditional Adversarial Generative Flow for Controllable Image Synthesis

Anisotropic Convolutional Networks for 3D Semantic Scene Completion

Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

Search to Distill: Pearls Are Everywhere but Not the Eyes

DPGN: Distribution Propagation Graph Network for Few-Shot Learning

Revisiting the Sibling Head in Object Detector

Communication Efficient SGD via Gradient Sampling With Bayes Prior

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Lifelong Person Re-Identification via Adaptive Knowledge Accumulation

Self-Supervised Video Representation Learning by Context and Motion Decoupling

Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way

MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

Long-Term Visual Localization With Mobile Sensors

Dimensionality-Varying Diffusion Process

ReasonNet: End-to-End Driving With Temporal and Global Reasoning

Recurrent Scale Approximation for Object Detection in CNN

Learning a Recurrent Residual Fusion Network for Multimodal Matching

Knowledge Distillation via Route Constrained Optimization

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Differentiable Kernel Evolution

Correlation Congruence for Knowledge Distillation

Scalable Place Recognition Under Appearance Change for Autonomous Driving

Switchable K-Class Hyperplanes for Noise-Robust Representation Learning

DETRs with Collaborative Hybrid Assignments Training

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

Generating Dynamic Kernels via Transformers for Lane Detection

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability

Deep Active Contours for Real-time 6-DoF Object Tracking

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Discriminability Distillation in Group Representation Learning

More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning

Camera Auto-Calibration from the Steiner Conic of the Fundamental Matrix

Unifying Visual Perception by Dispersible Points Learning

Self-Slimmed Vision Transformer

Rethinking Robust Representation Learning under Fine-Grained Noisy Faces

Towards Robust Face Recognition with Comprehensive Search

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints

"UniNet: Unified Architecture Search with Convolution, Transformer, and MLP"

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

Masked Autoencoders Are Stronger Knowledge Distillers

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

MangaNinja: Line Art Colorization with Precise Reference Following

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy

ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing

VACE: All-in-One Video Creation and Editing

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment

UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments

Improving Pointing Accuracy for 3D Target Selection in Virtual Reality through Depth Perception Biases Correction

As Pseudo-Label Free as Possible: Leveraging Adaptive Feature Generation for Sparsely Annotated Object Detection

CI-STHPAN: Pre-trained Attention Network for Stock Selection with Channel-Independent Spatio-Temporal Hypergraph

Critic-Guided Decision Transformer for Offline Reinforcement Learning

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

AnyDoor: Zero-shot Object-level Image Customization

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

EasyDrag: Efficient Point-based Manipulation on Diffusion Models

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Derivative Estimation in Random Design

Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Customizable Image Synthesis with Multiple Subjects

K-Means Clustering with Distributed Dimensions