Lei Li

49

Papers

1,388

Total Citations

4

Affiliations

Affiliations

Peking UniversityThe University of Hong KongUniversity of VirginiaCarnegie Mellon University

Papers (49)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Provable Robust Watermarking for AI-Generated Text

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

Temporal Reasoning Transfer from Text to Video

3D Neural Edge Reconstruction

Position-Aware Guided Point Cloud Completion with CLIP Model

Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching

Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation

An Efficient and Accurate Dynamic Sparse Training Framework Based on Parameter-Freezing

Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential Recommendations

PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

DE-COP: Detecting Copyrighted Content in Language Models Training Data

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

SurfPro: Functional Protein Design Based on Continuous Surface

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations

End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds

PointDSC: Robust Point Cloud Registration Using Deep Spatial Consistency

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals

Scale-Aware Automatic Augmentation for Object Detection

Locate Then Segment: A Strong Pipeline for Referring Image Segmentation

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Progressive Domain Expansion Network for Single Domain Generalization

Generalizable Local Feature Pre-Training for Deformable Shape Analysis

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval

SOLO: Segmenting Objects by Locations

Human Motion Instruction Tuning

VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model

MeshArt: Generating Articulated Meshes with Structure-Guided Transformers

LT3SD: Latent Trees for 3D Scene Diffusion

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy

DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching

MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation

To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning

BRITS: Bidirectional Recurrent Imputation for Time Series

Kernelized Bayesian Softmax for Text Generation

SOLOv2: Dynamic and Fast Instance Segmentation

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching

Statistical Knowledge Assessment for Large Language Models

NeurIPS 2023arXiv

ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation