Ning Zhang

22

Papers

22

Total Citations

Papers (22)

M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Deep Video Inverse Tone Mapping Based on Temporal Clues

Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners

Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

Compact Bilinear Pooling

Deep Reinforcement Learning-Based Image Captioning With Embedding Reward

Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks

Connecting What To Say With Where To Look by Modeling Human Attention Traces

Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment

Revisiting the Stack-Based Inverse Tone Mapping

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples

RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts

Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

Laplace Landmark Localization

Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields