Zhenguo Li

74

Papers

536

Total Citations

Papers (74)

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Accelerating Diffusion Sampling with Optimized Time Steps

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Implicit Search via Discrete Diffusion: A Study on Chess

CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs

Rethinking Performance Estimation in Neural Architecture Search

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

Boosting Few-Shot Learning With Adaptive Margin Loss

iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

Transformation Invariant Few-Shot Object Detection

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-Supervised Continual Learning

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation

Adversarial Invariant Learning

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation

Long-Tail Recognition via Compositional Knowledge Transfer

Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes

OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization

PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework

Mixed Autoencoder for Self-Supervised Visual Representation Learning

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation

DetCo: Unsupervised Contrastive Learning for Object Detection

Towards Understanding the Generative Capability of Adversarially Robust Classifiers

Adversarial Robustness for Unsupervised Domain Adaptation

MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving

NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization

Beyond One-to-One: Rethinking the Referring Image Segmentation

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-efficient Fine-Tuning

DDP: Diffusion Model for Dense Visual Prediction

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

Generative Negative Text Replay for Continual Vision-Language Pretraining

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Adding Additional Control to One-Step Diffusion with Joint Distribution Matching

Masked Diffusion Models as Energy Minimization

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

New Insights Into Laplacian Similarity Search

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection

Spatial-Aware Graph Relation Network for Large-Scale Object Detection

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

Locally Differentially Private (Contextual) Bandits Learning

On Effective Scheduling of Model-based Reinforcement Learning

MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps

iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder

OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression

Towards a Theoretical Framework of Out-of-Distribution Generalization

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Complexity Matters: Rethinking the Latent Space for Generative Modeling

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

DiffComplete: Diffusion-based Generative 3D Shape Completion

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation