Humphrey Shi

40

Papers

446

Total Citations

3

Affiliations

Affiliations

OregonGeorgia TechUIUC

Papers (40)

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Benchmarking Object Detectors with COCO: A New Path Forward

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Learning to Track Instances without Video Annotations

DiSparse: Disentangled Sparsification for Multitask Model Compression

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

Object Localization Under Single Coarse Point Supervision

Towards Layer-Wise Image Vectorization

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

OneFormer: One Transformer To Rule Universal Image Segmentation

Graph Transformer GANs for Graph-Constrained House Generation

Automatic High Resolution Wire Segmentation and Removal

Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning

Neighborhood Attention Transformer

Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

Interpretable Visual Reasoning via Induced Symbolic Space

MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition

Point-to-Box Network for Accurate Object Detection via Single Point Supervision

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

HyPiDecoder: Hybrid Pixel Decoder for Efficient Segmentation and Detection

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Brush2Prompt: Contextual Prompt Generator for Object Inpainting

Rethinking Text Segmentation: A Novel Dataset and a Text-Specific Refinement Approach

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Mask Matching Transformer for Few-Shot Segmentation

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation