Qi Zhao

41

Papers

93

Total Citations

Papers (41)

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

Model Lineage Closeness Analysis

Explainable Saliency: Articulating Reasoning with Contextual Prioritization

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Beyond Average: Individualized Visual Scanpath Prediction

ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck

SALICON: Saliency in Context

Label Consistent Quadratic Surrogate Model for Visual Saliency Prediction

A Paradigm for Building Generalized Models of Human Image Perception Through Data Fusion

Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks

Emotional Attention: A Study of Image Sentiment and Visual Attention

Learning to Detect Human-Object Interactions With Knowledge

Learning to Learn From Noisy Labeled Data

Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention

Predicting Human Scanpaths in Visual Question Answering

Explicit Knowledge Incorporation for Visual Reasoning

REX: Reasoning-Aware and Grounded Explanation

Query and Attention Augmentation for Knowledge-Based Explainable Reasoning

VisualHow: Multimodal Problem Solving

Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks

Dual-Glance Model for Deciphering Social Relationships

Learning Visual Attention to Identify People With Autism Spectrum Disorder

Attention-Based Autism Spectrum Disorder Screening With Privileged Modality

Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge

AiR: Attention with Reasoning Capability

n-Reference Transfer Learning for Saliency Prediction

New Datasets and Models for Contextual Reasoning in Visual Dialog

Two Sides of the Same Coin: Learning the Backdoor to Remove the Backdoor

Synthetic Video Enhances Physical Fidelity in Video Synthesis

CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

Unsupervised Learning of View-invariant Action Representations

Learning metrics for persistence-based summaries and applications for graph classification

Learning to Predict Trustworthiness with Steep Slope Loss

NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs

What Do Deep Saliency Models Learn about Visual Attention?

Safe Subspace Screening for Nuclear Norm Regularized Least Squares Problems