Jie Zhou

163
Papers
2,784
Total Citations
1
Affiliations

Affiliations

Tencent Inc.

Papers (163)

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

ICLR 2024
1,128
citations

Runtime Neural Pruning

NeurIPS 2017
509
citations

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

ICLR 2024
476
citations

Large Language Models Are Not Robust Multiple Choice Selectors

ICLR 2024
370
citations

Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

ECCV 2020
81
citations

Graph-Based Social Relation Reasoning

ECCV 2020
51
citations

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

ECCV 2020
41
citations

FlowIE: Efficient Image Enhancement via Rectified Flow

CVPR 2024
31
citations

LiDAR-based Person Re-identification

CVPR 2024
19
citations

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

CVPR 2024
16
citations

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

ICCV 2025
16
citations

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

CVPR 2025arXiv
10
citations

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

CVPR 2025
6
citations

Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection

AAAI 2025
6
citations

Continuous Visual Autoregressive Generation via Score Maximization

ICML 2025
5
citations

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

ECCV 2024
4
citations

Path Choice Matters for Clear Attributions in Path Methods

ICLR 2024
4
citations

Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis

CVPR 2025
4
citations

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

ICCV 2025arXiv
3
citations

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

NeurIPS 2025
2
citations

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing

CVPR 2025
2
citations

LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction

CVPR 2024
0
citations

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

CVPR 2024
0
citations

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024
0
citations

Memory-based Adapters for Online 3D Scene Perception

CVPR 2024
0
citations

Towards Accurate Post-training Quantization for Diffusion Models

CVPR 2024
0
citations

Language Generation with Strictly Proper Scoring Rules

ICML 2024
0
citations

Exploring the Benefit of Activation Sparsity in Pre-training

ICML 2024
0
citations

On Prompt-Driven Safeguarding for Large Language Models

ICML 2024
0
citations

Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

ICML 2024
0
citations

Multi-Manifold Deep Metric Learning for Image Set Classification

CVPR 2015
0
citations

Deep Hashing for Compact Binary Codes Learning

CVPR 2015
0
citations

Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks

CVPR 2016
0
citations

Learning Deep Binary Descriptor With Multi-Quantization

CVPR 2017
0
citations

Consistent-Aware Deep Learning for Person Re-Identification in a Camera Network

CVPR 2017
0
citations

Deep Adversarial Metric Learning

CVPR 2018arXiv
0
citations

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition

CVPR 2018
0
citations

Learning Globally Optimized Object Detector via Policy Gradient

CVPR 2018
0
citations

Deep Hashing via Discrepancy Minimization

CVPR 2018
0
citations

GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning

CVPR 2018
0
citations

Hardness-Aware Deep Metric Learning

CVPR 2019
0
citations

Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition

CVPR 2019
0
citations

Learning Channel-Wise Interactions for Binary Convolutional Neural Networks

CVPR 2019
0
citations

Structural Relational Reasoning of Point Clouds

CVPR 2019
0
citations

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

CVPR 2019
0
citations

BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation

CVPR 2019
0
citations

COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis

CVPR 2019
0
citations

UniformFace: Learning Deep Equidistributed Representation for Face Recognition

CVPR 2019
0
citations

Deep Embedding Learning With Discriminative Sampling Policy

CVPR 2019
0
citations

Enhanced Bayesian Compression via Deep Reinforcement Learning

CVPR 2019
0
citations

BiDet: An Efficient Binarized Object Detector

CVPR 2020arXiv
0
citations

Deep Metric Learning via Adaptive Learnable Assessment

CVPR 2020
0
citations

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

CVPR 2020arXiv
0
citations

Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation

CVPR 2020arXiv
0
citations

Structure-Preserving Super Resolution With Gradient Guidance

CVPR 2020arXiv
0
citations

Uncertainty-Aware Score Distribution Learning for Action Quality Assessment

CVPR 2020arXiv
0
citations

Self-Supervised Video Hashing via Bidirectional Transformers

CVPR 2021
0
citations

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

CVPR 2021
0
citations

Objects Are Different: Flexible Monocular 3D Object Detection

CVPR 2021arXiv
0
citations

Deep Compositional Metric Learning

CVPR 2021
0
citations

Meta-Mining Discriminative Samples for Kinship Verification

CVPR 2021arXiv
0
citations

Pseudo Facial Generation With Extreme Poses for Face Recognition

CVPR 2021
0
citations

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

CVPR 2021arXiv
0
citations

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

CVPR 2021arXiv
0
citations

PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds

CVPR 2021
0
citations

HyperDet3D: Learning a Scene-Conditioned 3D Object Detector

CVPR 2022arXiv
0
citations

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

CVPR 2022arXiv
0
citations

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

CVPR 2022
0
citations

FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment

CVPR 2022arXiv
0
citations

Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement

CVPR 2022arXiv
0
citations

Dimension Embeddings for Monocular 3D Object Detection

CVPR 2022
0
citations

Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling

CVPR 2022
0
citations

DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting

CVPR 2022arXiv
0
citations

Attributable Visual Similarity Learning

CVPR 2022arXiv
0
citations

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

CVPR 2022arXiv
0
citations

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

CVPR 2022
0
citations

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

CVPR 2023arXiv
0
citations

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information

CVPR 2023arXiv
0
citations

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment

CVPR 2023
0
citations

Deep Factorized Metric Learning

CVPR 2023
0
citations

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

CVPR 2023
0
citations

FLAG3D: A 3D Fitness Activity Dataset With Language Instruction

CVPR 2023arXiv
0
citations

Diffusion-SDF: Text-To-Shape via Voxelized Diffusion

CVPR 2023
0
citations

Siamese Image Modeling for Self-Supervised Vision Representation Learning

CVPR 2023arXiv
0
citations

Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

CVPR 2023arXiv
0
citations

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

CVPR 2023arXiv
0
citations

DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

CVPR 2023
0
citations

Multiple Feature Fusion via Weighted Entropy for Visual Tracking

ICCV 2015
0
citations

Simultaneous Local Binary Feature Learning and Encoding for Face Recognition

ICCV 2015
0
citations

Local Subspace Collaborative Tracking

ICCV 2015
0
citations

Learning Discriminative Aggregation Network for Video-Based Face Recognition

ICCV 2017
0
citations

Attention-Aware Deep Reinforcement Learning for Video Face Recognition

ICCV 2017
0
citations

Cross-Modal Deep Variational Hashing

ICCV 2017
0
citations

Neighborhood Preserving Hashing for Scalable Video Retrieval

ICCV 2019
0
citations

Deep Meta Metric Learning

ICCV 2019
0
citations

Self-Critical Attention Learning for Person Re-Identification

ICCV 2019
0
citations

Robust Variational Bayesian Point Set Registration

ICCV 2019
0
citations

Group-Aware Contrastive Regression for Action Quality Assessment

ICCV 2021arXiv
0
citations

Instance Similarity Learning for Unsupervised Feature Representation

ICCV 2021arXiv
0
citations

PoinTr: Diverse Point Cloud Completion With Geometry-Aware Transformers

ICCV 2021arXiv
0
citations

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

ICCV 2021
0
citations

Deep Relational Metric Learning

ICCV 2021arXiv
0
citations

RandomRooms: Unsupervised Pre-Training From Synthetic Shapes and Randomized Layouts for 3D Object Detection

ICCV 2021arXiv
0
citations

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo

ICCV 2021arXiv
0
citations

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

ICCV 2021arXiv
0
citations

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

CVPR 2025
0
citations

Gait Recognition in the Wild: A Benchmark

ICCV 2021
0
citations

Human Trajectory Prediction via Counterfactual Analysis

ICCV 2021arXiv
0
citations

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-Identification

ICCV 2021arXiv
0
citations

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

ICCV 2023arXiv
0
citations

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

ICCV 2023
0
citations

Token-Label Alignment for Vision Transformers

ICCV 2023arXiv
0
citations

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

ICCV 2023
0
citations

TCOVIS: Temporally Consistent Online Video Instance Segmentation

ICCV 2023arXiv
0
citations

CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

ICCV 2023
0
citations

SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

ICCV 2023arXiv
0
citations

Unleashing Text-to-Image Diffusion Models for Visual Perception

ICCV 2023arXiv
0
citations

Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification

ECCV 2020
0
citations

Reinforced Axial Refinement Network for Monocular 3D Object Detection

ECCV 2020
0
citations

Structural Deep Metric Learning for Room Layout Estimation

ECCV 2020
0
citations

Deep Hashing with Active Pairwise Supervision

ECCV 2020
0
citations

Rotation-robust Intersection over Union for 3D Object Detection

ECCV 2020
0
citations

Spatial Geometric Reasoning for Room Layout Estimation via Deep Reinforcement Learning

ECCV 2020
0
citations

Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value

ECCV 2022
0
citations

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

ECCV 2022
0
citations

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

ECCV 2022
0
citations

Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution

ECCV 2022
0
citations

AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers

ECCV 2022
0
citations

Dynamic Metric Learning with Cross-Level Concept Distillation

ECCV 2022
0
citations

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

ECCV 2022
0
citations

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question

NeurIPS 2015arXiv
0
citations

Towards Interpretable Deep Metric Learning With Structural Matching

ICCV 2021arXiv
0
citations

EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models

CVPR 2025
0
citations

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

CVPR 2025
0
citations

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

CVPR 2025
0
citations

Learning Counterfactually Decoupled Attention for Open-World Model Attribution

ICCV 2025
0
citations

EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients

ICCV 2025
0
citations

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

ICCV 2025
0
citations

WalkVLM: Aid Visually Impaired People Walking by Vision Language Model

ICCV 2025
0
citations

MCID: Multi-aspect Copyright Infringement Detection for Generated Images

ICCV 2025
0
citations

D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

ICCV 2025
0
citations

Authentic 4D Driving Simulation with a Video Generation Model

ICCV 2025
0
citations

SpectralAR: Spectral Autoregressive Visual Generation

ICCV 2025
0
citations

Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment

ICCV 2025
0
citations

From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection

ICCV 2025
0
citations

Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

AAAI 2025
0
citations

Teaching Large Language Models to Translate with Comparison

AAAI 2024
0
citations

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

AAAI 2024
0
citations

Tree-of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models

AAAI 2024
0
citations

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding

AAAI 2024
0
citations

Generative Multi-Modal Knowledge Retrieval with Large Language Models

AAAI 2024
0
citations

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

CVPR 2024
0
citations

Global Filter Networks for Image Classification

NeurIPS 2021
0
citations

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

NeurIPS 2021arXiv
0
citations

Topology-Imbalance Learning for Semi-Supervised Node Classification

NeurIPS 2021
0
citations

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

NeurIPS 2022
0
citations

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

NeurIPS 2022
0
citations

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

NeurIPS 2022
0
citations

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

NeurIPS 2022
0
citations

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory

NeurIPS 2023
0
citations

UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

NeurIPS 2023
0
citations

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

NeurIPS 2023
0
citations

Fed-FA: Theoretically Modeling Client Data Divergence for Federated Language Backdoor Defense

NeurIPS 2023
0
citations