Ning Zhang

22
Papers
22
Total Citations

Papers (22)

M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis

AAAI 2024
14
citations

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

CVPR 2025arXiv
7
citations

Deep Video Inverse Tone Mapping Based on Temporal Clues

CVPR 2024
1
citations

Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners

ICML 2024
0
citations

Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

CVPR 2015
0
citations

Compact Bilinear Pooling

CVPR 2016
0
citations

Deep Reinforcement Learning-Based Image Captioning With Embedding Reward

CVPR 2017arXiv
0
citations

Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks

CVPR 2019
0
citations

Connecting What To Say With Where To Look by Modeling Human Attention Traces

CVPR 2021arXiv
0
citations

Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment

CVPR 2022arXiv
0
citations

Revisiting the Stack-Based Inverse Tone Mapping

CVPR 2023
0
citations

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

CVPR 2023
0
citations

SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples

CVPR 2023
0
citations

RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts

CVPR 2023
0
citations

Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation

CVPR 2023arXiv
0
citations

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

ICCV 2019
0
citations

Laplace Landmark Localization

ICCV 2019
0
citations

Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model

ECCV 2020
0
citations

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

CVPR 2025
0
citations

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

CVPR 2025
0
citations

Apollo: An Exploration of Video Understanding in Large Multimodal Models

CVPR 2025
0
citations

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

NeurIPS 2022
0
citations