Ning Zhang
22
Papers
22
Total Citations
Papers (22)
M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis
AAAI 2024
14
citations
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
CVPR 2025arXiv
7
citations
Deep Video Inverse Tone Mapping Based on Temporal Clues
CVPR 2024
1
citations
Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners
ICML 2024
0
citations
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues
CVPR 2015
0
citations
Compact Bilinear Pooling
CVPR 2016
0
citations
Deep Reinforcement Learning-Based Image Captioning With Embedding Reward
CVPR 2017arXiv
0
citations
Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks
CVPR 2019
0
citations
Connecting What To Say With Where To Look by Modeling Human Attention Traces
CVPR 2021arXiv
0
citations
Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
CVPR 2022arXiv
0
citations
Revisiting the Stack-Based Inverse Tone Mapping
CVPR 2023
0
citations
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
CVPR 2023
0
citations
SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples
CVPR 2023
0
citations
RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts
CVPR 2023
0
citations
Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation
CVPR 2023arXiv
0
citations
Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
ICCV 2019
0
citations
Laplace Landmark Localization
ICCV 2019
0
citations
Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model
ECCV 2020
0
citations
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
CVPR 2025
0
citations
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
CVPR 2025
0
citations
Apollo: An Exploration of Video Understanding in Large Multimodal Models
CVPR 2025
0
citations
Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields
NeurIPS 2022
0
citations