124
Papers
2,328
Total Citations
1
Affiliations

Affiliations

Tencent Youtu Lab

Papers (124)

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

ICLR 2024
1,366
citations

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

CVPR 2024
449
citations

Cascade Graph Neural Networks for RGB-D Salient Object Detection

ECCV 2020
113
citations

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

ICCV 2025
58
citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025
40
citations

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

CVPR 2024
37
citations

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

ICLR 2025
35
citations

Multi-Space Alignments Towards Universal LiDAR Segmentation

CVPR 2024
30
citations

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

CVPR 2024
29
citations

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

NeurIPS 2025
26
citations

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

CVPR 2025
17
citations

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

CVPR 2024
16
citations

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

CVPR 2024
16
citations

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

CVPR 2024
13
citations

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

ICLR 2025
12
citations

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

AAAI 2024arXiv
10
citations

V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

CVPR 2025
10
citations

MobileInst: Video Instance Segmentation on the Mobile

AAAI 2024arXiv
10
citations

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

ECCV 2024
10
citations

CADDreamer: CAD Object Generation from Single-view Images

CVPR 2025
9
citations

Inverse Weight-Balancing for Deep Long-Tailed Learning

AAAI 2024
7
citations

MetaCARD: Meta-Reinforcement Learning with Task Uncertainty Feedback via Decoupled Context-Aware Reward and Dynamics Components

AAAI 2024
4
citations

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

AAAI 2025
4
citations

Symbolic Neural Ordinary Differential Equations

AAAI 2025
3
citations

MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks

ECCV 2024
2
citations

RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler

CVPR 2025
2
citations

Learning Latent Dynamic Robust Representations for World Models

ICML 2024
0
citations

A Unified Adaptive Testing System Enabled by Hierarchical Structure Search

ICML 2024
0
citations

Simplified Mirror-Based Camera Pose Computation via Rotation Averaging

CVPR 2015
0
citations

Object-Aware Dense Semantic Correspondence

CVPR 2017
0
citations

NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences

CVPR 2019
0
citations

Target-Aware Deep Tracking

CVPR 2019
0
citations

RF-Net: An End-To-End Image Matching Network Based on Receptive Field

CVPR 2019
0
citations

LO-Net: Deep Real-Time Lidar Odometry

CVPR 2019
0
citations

Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search

CVPR 2019
0
citations

Probabilistic Model Distillation for Semantic Correspondence

CVPR 2021
0
citations

Learning Semantic Person Image Generation by Region-Adaptive Normalization

CVPR 2021arXiv
0
citations

Mutual Graph Learning for Camouflaged Object Detection

CVPR 2021arXiv
0
citations

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

CVPR 2021arXiv
0
citations

Multi-Object Tracking Meets Moving UAV

CVPR 2022
0
citations

Learning Optical Flow With Kernel Patch Attention

CVPR 2022
0
citations

Unsupervised Learning of Accurate Siamese Tracking

CVPR 2022arXiv
0
citations

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

CVPR 2022arXiv
0
citations

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

CVPR 2022arXiv
0
citations

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

CVPR 2022arXiv
0
citations

Neural Collaborative Graph Machines for Table Structure Recognition

CVPR 2022arXiv
0
citations

SCPNet: Semantic Scene Completion on Point Cloud

CVPR 2023arXiv
0
citations

Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective

CVPR 2023arXiv
0
citations

LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion

CVPR 2023arXiv
0
citations

Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring

CVPR 2023
0
citations

Micron-BERT: BERT-Based Facial Micro-Expression Recognition

CVPR 2023
0
citations

Vector Quantization With Self-Attention for Quality-Independent Representation Learning

CVPR 2023
0
citations

Virtual Sparse Convolution for Multimodal 3D Object Detection

CVPR 2023arXiv
0
citations

Low-Rank Tensor Approximation With Laplacian Scale Mixture Modeling for Multiframe Image Denoising

ICCV 2015
0
citations

3D Fragment Reassembly Using Integrated Template Guidance and Fracture-Region Matching

ICCV 2015
0
citations

Semi-Supervised Zero-Shot Classification With Label Representation Learning

ICCV 2015
0
citations

FoveaNet: Perspective-Aware Urban Scene Parsing

ICCV 2017arXiv
0
citations

SBGAR: Semantics Based Group Activity Recognition

ICCV 2017
0
citations

Video Scene Parsing With Predictive Feature Learning

ICCV 2017arXiv
0
citations

Adversarial Examples Detection in Deep Networks With Convolutional Filter Statistics

ICCV 2017arXiv
0
citations

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

ICCV 2019
0
citations

Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis

ICCV 2019
0
citations

Paint Transformer: Feed Forward Neural Painting With Stroke Prediction

ICCV 2021arXiv
0
citations

Saliency-Associated Object Tracking

ICCV 2021arXiv
0
citations

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer

ICCV 2021arXiv
0
citations

Uncertainty-Guided Transformer Reasoning for Camouflaged Object Detection

ICCV 2021
0
citations

CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations

ICCV 2023
0
citations

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

ICCV 2023arXiv
0
citations

Surface Extraction from Neural Unsigned Distance Fields

ICCV 2023arXiv
0
citations

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

ICCV 2023arXiv
0
citations

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

ICCV 2023arXiv
0
citations

Low-Light Image Enhancement with Multi-Stage Residue Quantization and Brightness-Aware Attention

ICCV 2023
0
citations

Batch-based Model Registration for Fast 3D Sherd Reconstruction

ICCV 2023arXiv
0
citations

Fast Full-frame Video Stabilization with Iterative Optimization

ICCV 2023arXiv
0
citations

LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution

ICCV 2023arXiv
0
citations

Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells

ICCV 2023arXiv
0
citations

CiteTracker: Correlating Image and Text for Visual Tracking

ICCV 2023arXiv
0
citations

LIRA: Lifelong Image Restoration from Unknown Blended Distortions

ECCV 2020
0
citations

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

ECCV 2020
0
citations

Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction

ECCV 2020
0
citations

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

ECCV 2020
0
citations

Uncertainty Learning in Kernel Estimation for Multi-stage Blind Image Super-Resolution

ECCV 2022
0
citations

Neural Color Operators for Sequential Image Retouching

ECCV 2022
0
citations

RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection

ECCV 2022
0
citations

Self-Feature Distillation with Uncertainty Modeling for Degraded Image Recognition

ECCV 2022
0
citations

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

ECCV 2022
0
citations

Learning Parametric Sparse Models for Image Super-Resolution

NeurIPS 2016
0
citations

GAFlow: Incorporating Gaussian Attention into Optical Flow

ICCV 2023
0
citations

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

CVPR 2025
0
citations

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

CVPR 2025
0
citations

Parameterized Blur Kernel Prior Learning for Local Motion Deblurring

CVPR 2025
0
citations

Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes

CVPR 2025
0
citations

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

ICCV 2025
0
citations

Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer

ICCV 2025
0
citations

Controllable 3D Outdoor Scene Generation via Scene Graphs

ICCV 2025
0
citations

ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads

ICCV 2025
0
citations

CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition

ICCV 2025
0
citations

Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model

AAAI 2025
0
citations

Training-Free Image Manipulation Localization Using Diffusion Models

AAAI 2025
0
citations

Automated Creation of Reusable and Diverse Toolsets for Enhancing LLM Reasoning

AAAI 2025
0
citations

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

AAAI 2024arXiv
0
citations

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision

AAAI 2024
0
citations

Improving GNN Calibration with Discriminative Ability: Insights and Strategies

AAAI 2024
0
citations

Pushing the Limit of Fine-Tuning for Few-Shot Learning: Where Feature Reusing Meets Cross-Scale Attention

AAAI 2024
0
citations

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

AAAI 2024arXiv
0
citations

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

CVPR 2024
0
citations

SeD: Semantic-Aware Discriminator for Image Super-Resolution

CVPR 2024
0
citations

RTracker: Recoverable Tracking via PN Tree Structured Memory

CVPR 2024
0
citations

KVQ: Kwai Video Quality Assessment for Short-form Videos

CVPR 2024
0
citations

HRVDA: High-Resolution Visual Document Assistant

CVPR 2024
0
citations

HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection

CVPR 2024
0
citations

From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems

ICML 2024
0
citations

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

ICML 2024
0
citations

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

NeurIPS 2020
0
citations

Uncertainty-Driven Loss for Single Image Super-Resolution

NeurIPS 2021
0
citations

DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

NeurIPS 2021
0
citations

Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning

NeurIPS 2022
0
citations

AttCAT: Explaining Transformers via Attentive Class Activation Tokens

NeurIPS 2022
0
citations

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models

NeurIPS 2023
0
citations

A Bounded Ability Estimation for Computerized Adaptive Testing

NeurIPS 2023
0
citations

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

NeurIPS 2023
0
citations

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

NeurIPS 2023
0
citations

GradOrth: A Simple yet Efficient Out-of-Distribution Detection with Orthogonal Projection of Gradients

NeurIPS 2023
0
citations

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader

NeurIPS 2023
0
citations