Ser-Nam Lim

45

Papers

265

Total Citations

Papers (45)

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Few-Shot Object Detection with Foundation Models

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Composing Object Relations and Attributes for Image-Text Matching

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Fast Encoding and Decoding for Implicit Video Representation

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Towards Scalable Neural Representation for Diverse Videos

Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning

TIPI: Test Time Adaptation With Transformation Invariance

Detecting Everything in the Open World: Towards Universal Object Detection

Computationally Budgeted Continual Learning: What Does Matter?

HNeRV: A Hybrid Neural Representation for Videos

Enhancing Adversarial Example Transferability With an Intermediate Level Attack

Cross-X Learning for Fine-Grained Visual Categorization

Exploring Visual Engagement Signals for Representation Learning

Joint Audio-Visual Deepfake Detection

Deep Co-Training With Task Decomposition for Semi-Supervised Domain Adaptation

Robustness and Generalization via Generative Adversarial Training

Open-vocabulary Panoptic Segmentation with Embedding Modulation

BT^2: Backward-compatible Training with Basis Transformation

Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right?

Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors

Quantization Guided JPEG Artifact Correction

Curriculum Manager for Source Selection in Multi-Source Domain Adaptation

A Metric Learning Reality Check

What makes fake images detectable? Understanding properties that generalize

Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions

Totems: Physical Objects for Verifying Visual Integrity

MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning

Visual Prompt Tuning

Generative Zero-Shot Composed Image Retrieval

Object-Centric Unsupervised Image Captioning

UniMODE: Unified Monocular 3D Object Detection

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Object Recognition as Next Token Prediction

One-Shot Domain Adaptation for Face Generation

Intentonomy: A Dataset and Study Towards Human Intent Understanding

Efficient Object Embedding for Spliced Image Retrieval

On Feature Normalization and Data Augmentation

ObjectFormer for Image Manipulation Detection and Localization

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition