Wei Li

43
Papers
1,033
Total Citations

Papers (43)

SALMONN: Towards Generic Hearing Abilities for Large Language Models

ICLR 2024
447
citations

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

ICLR 2025
180
citations

OMG-Seg: Is One Model Good Enough For All Segmentation?

CVPR 2024
106
citations

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

CVPR 2024
81
citations

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

CVPR 2024
36
citations

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

CVPR 2025
33
citations

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

ICCV 2025
33
citations

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

ICCV 2025
23
citations

F-LMM: Grounding Frozen Large Multimodal Models

CVPR 2025
21
citations

Delta Decompression for MoE-based LLMs Compression

ICML 2025
18
citations

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

AAAI 2025
9
citations

Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

ECCV 2024
9
citations

CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning

AAAI 2025
4
citations

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

CVPR 2025
4
citations

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

CVPR 2025
4
citations

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

AAAI 2025
4
citations

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

ICLR 2025
3
citations

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

CVPR 2025
3
citations

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

NeurIPS 2025
3
citations

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

CVPR 2025
3
citations

Can a Large Language Model be a Gaslighter?

ICLR 2025
2
citations

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

ICCV 2025
2
citations

Uni-LoRA: One Vector is All You Need

NeurIPS 2025arXiv
2
citations

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

AAAI 2025
1
citations

Efficient Spiking Point Mamba for Point Cloud Analysis

ICCV 2025
1
citations

CGS-Mask: Making Time Series Predictions Intuitive for All

AAAI 2024arXiv
1
citations

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

ICCV 2025
0
citations

Breaking Information Isolation: Accelerating MRI via Inter-sequence Mapping and Progressive Masking

AAAI 2025
0
citations

HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation

ICCV 2025
0
citations

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expressions

AAAI 2025
0
citations

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

AAAI 2025
0
citations

AIRA: Activation-Informed Low-Rank Adaptation for Large Models

ICCV 2025
0
citations

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

AAAI 2024
0
citations

Multi-Modal Disordered Representation Learning Network for Description-Based Person Search

AAAI 2024
0
citations

AutoOS: Make Your OS More Powerful by Exploiting Large Language Models

ICML 2024
0
citations

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation

ICCV 2025
0
citations

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

ICCV 2025
0
citations

WildAvatar: Learning In-the-wild 3D Avatars from the Web

CVPR 2025
0
citations

Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion

CVPR 2025
0
citations

LMO: Linear Mamba Operator for MRI Reconstruction

CVPR 2025
0
citations

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

ICML 2024
0
citations

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

ICML 2024
0
citations

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

AAAI 2025
0
citations