Sangmin Lee

15

Papers

57

Total Citations

Papers (15)

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

SocialGesture: Delving into Multi-person Gesture Understanding

Question-Aware Gaussian Experts for Audio-Visual Question Answering

Self-supervised Debiasing Using Low Rank Regularization

Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution

Defining Neural Network Architecture through Polytope Structures of Datasets

DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection

LAMA-UT: Language Agnostic Multilingual ASR Through Orthography Unification and Language-Specific Transliteration

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation