Chen Li

53

Papers

143

Total Citations

Papers (53)

ST-LLM: Large Language Models Are Effective Temporal Learners

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

AugDETR: Improving Multi-scale Learning for Detection Transformer

Detecting Adversarial Data Using Perturbation Forgery

TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly

DAMap: Distance-aware MapNet for High Quality HD Map Construction

IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A

Morph: A Motion-free Physics Optimization Framework for Human Motion Generation

Text-guided Visual Prompt DINO for Generic Segmentation

RemDet: Rethinking Efficient Model Design for UAV Object Detection

Mamba YOLO: A Simple Baseline for Object Detection with State Space Model

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

GxVAEs: Two Joint VAEs Generate Hit Molecules from Gene Expression Profiles

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation

Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective

Simulating Makeup Through Physics-Based Manipulation of Intrinsic Image Layers

Specular Highlight Removal in Facial Images

Radiometric Calibration From Faces in Images

Convolutional Sequence to Sequence Model for Human Dynamics

MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction

Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network

Viewport Proposal CNN for 360deg Video Quality Assessment

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

Distribution Consistent Neural Architecture Search

Computing Wasserstein-p Distance Between Images With Linear Cost

DLFormer: Discrete Latent Transformer for Video Inpainting

ScarceNet: Animal Pose Estimation With Scarce Annotations

NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects

Weak-Shot Object Detection Through Mutual Knowledge Transfer

Efficient Diffusion Training via Min-SNR Weighting Strategy

DETR Does Not Need Multi-Scale or Locality Design

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence

Weakly-supervised 3D Pose Transfer with Keypoints

"A Simple Approach and Benchmark for 21,000-Category Object Detection"

Hierarchical Feature Embedding for Visual Tracking

Overcoming the Trade-Off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Learning Efficient and Generalizable Human Representation with Human Gaussian Model

Coarse-to-fine Animal Pose and Shape Estimation

GNeSF: Generalizable Neural Semantic Fields

Formulating Discrete Probability Flow Through Optimal Transport