Yanwei Li

19

Papers

1,667

Total Citations

Papers (19)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

LISA: Reasoning Segmentation via Large Language Model

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Attention-Guided Unified Network for Panoptic Segmentation

Learning Dynamic Routing for Semantic Segmentation

Multi-Scale Aligned Distillation for Low-Resolution Detection

Scale-Aware Automatic Augmentation for Object Detection

Voxel Field Fusion for 3D Object Detection

Focal Sparse Convolutional Networks for 3D Object Detection

End-to-end 3D Tracking with Decoupled Queries

Attention-Aware Learning for Hyperparameter Prediction in Image Processing Pipelines

Fully Convolutional Networks for Panoptic Segmentation

Aligning Effective Tokens with Video Anomaly in Large Language Models

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Learnable Tree Filter for Structure-preserving Feature Transform

Rethinking Learnable Tree Filter for Generic Feature Transform

Fine-Grained Dynamic Head for Object Detection

Unifying Voxel-based Representation with Transformer for 3D Object Detection

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction