Shuo Yang

38

Papers

113

Total Citations

Papers (38)

MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation

WorldModelBench: Judging Video Generation Models As World Models

Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning

HashAttention: Semantic Sparsity for Faster Inference

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

BOOD: Boundary-based Out-Of-Distribution Data Generation

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

Neural networks on Symmetric Spaces of Noncompact Type

Optimizing Video Object Detection via a Scale-Time Lattice

Region Proposal by Guided Anchoring

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Compatibility-Aware Heterogeneous Visual Search

Positive-Congruent Training: Towards Regression-Free Model Updates

Single-View 3D Object Reconstruction From Shape Priors in Memory

CAFE: Learning To Condense Dataset by Aligning Features

BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency

Learning Imbalanced Data With Vision Transformers

From Facial Parts Responses to Face Detection: A Deep Learning Approach

FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos

Improving Lens Flare Removal with General-Purpose Pipeline and Multiple Light Sources Recovery

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

PPR: Physically Plausible Reconstruction from Monocular Videos

One Size Does NOT Fit All: Data-Adaptive Adversarial Training

"PartImageNet: A Large, High-Quality Dataset of Parts"

Towards Regression-Free Neural Networks for Diverse Compute Platforms

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

Video Summarization Using Denoising Diffusion Probabilistic Model

RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting

Revisiting Context Aggregation for Image Matting

Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

WIDER FACE: A Face Detection Benchmark

Residual Attention Network for Image Classification

Look at Boundary: A Boundary-Aware Face Alignment Algorithm

Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space

Does Preprocessing Help Training Over-parameterized Neural Networks?

Toward Understanding Privileged Features Distillation in Learning-to-Rank