Shengbang Tong

8

Papers

628

Total Citations

Papers (8)

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Scaling Language-Free Visual Representation Learning

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Unsupervised Manifold Linearizing and Clustering

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Revisiting Sparse Convolutional Model for Visual Recognition

White-Box Transformers via Sparse Rate Reduction

Mass-Producing Failures of Multimodal Systems with Language Models