Wenqi Shao

28

Papers

607

Total Citations

Papers (28)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Distilling Monocular Foundation Model for Fine-grained Depth Completion

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Cached Transformers: Improving Transformers with Differentiable Memory Cached

Cross-Subject Mind Decoding from Inaccurate Representations

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

SSN: Learning Sparse Switchable Normalization via SparsestMax

Real-Time Controllable Denoising for Image and Video

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Beyond One-to-One: Rethinking the Referring Image Segmentation

Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space

Rethinking the Pruning Criteria for Convolutional Neural Network

Foundation Model is Efficient Multimodal Multitask Model Selector