Wei Li

43

Papers

1,033

Total Citations

Papers (43)

SALMONN: Towards Generic Hearing Abilities for Large Language Models

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

OMG-Seg: Is One Model Good Enough For All Segmentation?

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

F-LMM: Grounding Frozen Large Multimodal Models

Delta Decompression for MoE-based LLMs Compression

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

Can a Large Language Model be a Gaslighter?

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

Uni-LoRA: One Vector is All You Need

NeurIPS 2025arXiv

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Efficient Spiking Point Mamba for Point Cloud Analysis

CGS-Mask: Making Time Series Predictions Intuitive for All

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

Breaking Information Isolation: Accelerating MRI via Inter-sequence Mapping and Progressive Masking

HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expressions

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

AIRA: Activation-Informed Low-Rank Adaptation for Large Models

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Multi-Modal Disordered Representation Learning Network for Description-Based Person Search

AutoOS: Make Your OS More Powerful by Exploiting Large Language Models

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

WildAvatar: Learning In-the-wild 3D Avatars from the Web

Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion

LMO: Linear Mamba Operator for MRI Reconstruction

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area