Ke Li

Google Scholar OpenReview

32

Papers

2,298

Total Citations

2

h-index

1

Affiliations

Affiliations

Xidian University

Papers (32)

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Constrained Bayesian Optimization under Partial Observations: Balanced Improvements and Provable Convergence

Weakly Supervised Open-Vocabulary Object Detection

FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection

Reinforcement Learning Friendly Vision-Language Model for Minecraft

SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning

Destroy and Repair Using Hyper-Graphs for Routing

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

A General and Efficient Training for Transformer via Token Expansion

Aligning and Prompting Everything All at Once for Universal Visual Perception

Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment

Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion

Radiance Fields in XR: A Survey on How Radiance Fields are Envisioned and Addressed for XR Research

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Know Where You Are From: Event-Based Segmentation via Spatio-Temporal Propagation

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

Probability-Density-aware Semi-supervised Learning

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Bridging Sequence-Structure Alignment in RNA Foundation Models

Semi-supervised Blind Image Quality Assessment through Knowledge Distillation and Incremental Learning

Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

PAPR in Motion: Seamless Point-level 3D Scene Interpolation