Pan Zhou

57

Papers

230

Total Citations

Papers (57)

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Diffusion Time-step Curriculum for One Image to 3D Generation

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning

Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation

InceptionNeXt: When Inception Meets ConvNeXt

Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Outlier-Robust Tensor PCA

Deep Adversarial Subspace Clustering

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding

MetaFormer Is Actually What You Need for Vision

Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees

Position-Guided Text Prompt for Vision-Language Pre-Training

You Can Ground Earlier Than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

Masked Diffusion Transformer is a Strong Image Synthesizer

Self-Promoted Supervision for Few-Shot Transformer

DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

Video Graph Transformer for Video Question Answering

Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization

Memory-Efficient 4-bit Preconditioned Stochastic Optimization

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs

Graph Agent Network: Empowering Nodes with Inference Capabilities for Adversarial Resilience

Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending Against Poisoning Attacks

Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

Towards Inductive Robustness: Distilling and Fostering Wave-Induced Resonance in Transductive GCNs against Graph Adversarial Attacks

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language

Sparse Enhanced Network: An Adversarial Generation Method for Robust Augmentation in Sequential Recommendation

Few-shot Learner Parameterization by Diffusion Time-steps

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

Friendly Sharpness-Aware Minimization

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Efficient Stochastic Gradient Hard Thresholding

Efficient Meta Learning via Minibatch Proximal Update

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

Theory-Inspired Path-Regularized Differential Network Architecture Search

Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning

A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

Inception Transformer

NeurIPS 2022arXiv

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Understanding Generalization and Optimization Performance of Deep CNNs