Wei Li

88

Papers

1,033

Total Citations

Papers (88)

SALMONN: Towards Generic Hearing Abilities for Large Language Models

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

OMG-Seg: Is One Model Good Enough For All Segmentation?

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

F-LMM: Grounding Frozen Large Multimodal Models

Delta Decompression for MoE-based LLMs Compression

Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

Can a Large Language Model be a Gaslighter?

Uni-LoRA: One Vector is All You Need

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Efficient Spiking Point Mamba for Point Cloud Analysis

CGS-Mask: Making Time Series Predictions Intuitive for All

Transferable Semantic Augmentation for Domain Adaptation

Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks

MeMOT: Multi-Object Tracking With Memory

PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis

Balancing Logit Variation for Long-Tailed Semantic Segmentation

Siamese DETR

Correlational Image Modeling for Self-Supervised Visual Pre-Training

Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

Attribute Recognition by Joint Recurrent Learning of Context and Correlation

Semantic Concentration for Domain Adaptation

Adaptive Surface Normal Constraint for Depth Estimation

A Simple Feature Augmentation for Domain Generalization

Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation

DVI: Depth Guided Video Inpainting for Autonomous Driving

Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition

Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks

Open-Vocabulary DETR with Conditional Matching

Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies

FindIt: Generalized Localization with Natural Language Queries

Generalizing GANs: A Turing Perspective

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

LMO: Linear Mamba Operator for MRI Reconstruction

Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion

WildAvatar: Learning In-the-wild 3D Avatars from the Web

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation

AIRA: Activation-Informed Low-Rank Adaptation for Large Models

HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Breaking Information Isolation: Accelerating MRI via Inter-sequence Mapping and Progressive Masking

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expressions

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

Multi-Modal Disordered Representation Learning Network for Description-Based Person Search

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

AutoOS: Make Your OS More Powerful by Exploiting Large Language Models

Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing

Appearance-and-Relation Networks for Video Classification

Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification

Harmonious Attention Network for Person Re-Identification

Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution

WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Dynamic Domain Adaptation for Efficient Inference

Improved Expressivity Through Dendritic Neural Networks

MST: Masked Self-Supervised Transformer for Visual Representation

DeepInteraction: 3D Object Detection via Modality Interaction

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Delving into Out-of-Distribution Detection with Vision-Language Representations

TransHP: Image Classification with Hierarchical Prompting

“Why Not Looking backward?” A Robust Two-Step Method to Automatically Terminate Bayesian Optimization

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image