Wenqi Shao

20

Papers

608

Total Citations

Papers (20)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Cached Transformers: Improving Transformers with Differentiable Memory Cached

Cross-Subject Mind Decoding from Inaccurate Representations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space