Ser-Nam Lim

43

Papers

253

Total Citations

Papers (43)

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Few-Shot Object Detection with Foundation Models

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Composing Object Relations and Attributes for Image-Text Matching

Fast Encoding and Decoding for Implicit Video Representation

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Generative Zero-Shot Composed Image Retrieval

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

UniMODE: Unified Monocular 3D Object Detection

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Object Recognition as Next Token Prediction

One-Shot Domain Adaptation for Face Generation

Intentonomy: A Dataset and Study Towards Human Intent Understanding

Efficient Object Embedding for Spliced Image Retrieval

On Feature Normalization and Data Augmentation

ObjectFormer for Image Manipulation Detection and Localization

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Towards Scalable Neural Representation for Diverse Videos

Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning

TIPI: Test Time Adaptation With Transformation Invariance

Detecting Everything in the Open World: Towards Universal Object Detection

Computationally Budgeted Continual Learning: What Does Matter?

HNeRV: A Hybrid Neural Representation for Videos

Exploring Visual Engagement Signals for Representation Learning

Joint Audio-Visual Deepfake Detection

Deep Co-Training With Task Decomposition for Semi-Supervised Domain Adaptation

Robustness and Generalization via Generative Adversarial Training

Open-vocabulary Panoptic Segmentation with Embedding Modulation

BT^2: Backward-compatible Training with Basis Transformation

Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right?

Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors

Quantization Guided JPEG Artifact Correction

Curriculum Manager for Source Selection in Multi-Source Domain Adaptation

A Metric Learning Reality Check

What makes fake images detectable? Understanding properties that generalize

Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions

Totems: Physical Objects for Verifying Visual Integrity

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

Visual Prompt Tuning

Object-Centric Unsupervised Image Captioning