Kai Zhang

45

Papers

1,404

Total Citations

1

Affiliations

Affiliations

Department of Computer Science and Engineering, The Ohio State University

Papers (45)

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

Deep Equilibrium Diffusion Restoration with Parallel Sampling

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Gaussian Mixture Flow Matching Models

Turbo3D: Ultra-fast Text-to-3D Generation

DATENeRF: Depth-Aware Text-based Editing of NeRFs

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Generating 3D-Consistent Videos from Unposed Internet Photos

A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking

Enhancing Low-Light Images: A Synthetic Data Perspective on Practical and Generalizable Solutions

Reverse Convolution and Its Applications to Image Restoration

Intent Oriented Contrastive Learning for Sequential Recommendation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Equivariant Multi-Modality Image Fusion

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

RayZer: A Self-supervised Large View Synthesis Model

DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion

Federated Self-Explaining GNNs with Anti-shortcut Augmentations

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Lightweight Image Super-Resolution via Flexible Meta Pruning

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Adaptive Multimodal Fusion: Dynamic Attention Allocation for Intent Recognition

DiffRAW: Leveraging Diffusion Model to Generate DSLR-Comparable Perceptual Quality sRGB from Smartphone RAW Images