Ser-Nam Lim

14

Papers

265

Total Citations

Papers (14)

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Few-Shot Object Detection with Foundation Models

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Composing Object Relations and Attributes for Image-Text Matching

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Fast Encoding and Decoding for Implicit Video Representation

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Object Recognition as Next Token Prediction

UniMODE: Unified Monocular 3D Object Detection

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Generative Zero-Shot Composed Image Retrieval