52
Papers
2,216
Total Citations
1
Affiliations

Affiliations

Tencent Youtu Lab

Papers (52)

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

ICLR 2024
1,366
citations

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

CVPR 2024
449
citations

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

ICCV 2025
58
citations

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025
40
citations

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

CVPR 2024
37
citations

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

ICLR 2025arXiv
36
citations

Multi-Space Alignments Towards Universal LiDAR Segmentation

CVPR 2024
30
citations

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

CVPR 2024
29
citations

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

NeurIPS 2025
26
citations

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

CVPR 2025arXiv
17
citations

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

CVPR 2024
16
citations

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

CVPR 2024
16
citations

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

CVPR 2024
13
citations

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

ICLR 2025
12
citations

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

ECCV 2024arXiv
10
citations

MobileInst: Video Instance Segmentation on the Mobile

AAAI 2024arXiv
10
citations

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

AAAI 2024arXiv
10
citations

V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

CVPR 2025
10
citations

CADDreamer: CAD Object Generation from Single-view Images

CVPR 2025
9
citations

Inverse Weight-Balancing for Deep Long-Tailed Learning

AAAI 2024
7
citations

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

AAAI 2025
4
citations

MetaCARD: Meta-Reinforcement Learning with Task Uncertainty Feedback via Decoupled Context-Aware Reward and Dynamics Components

AAAI 2024
4
citations

Symbolic Neural Ordinary Differential Equations

AAAI 2025
3
citations

RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler

CVPR 2025
2
citations

MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks

ECCV 2024
2
citations

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

ICML 2024
0
citations

Learning Latent Dynamic Robust Representations for World Models

ICML 2024
0
citations

A Unified Adaptive Testing System Enabled by Hierarchical Structure Search

ICML 2024
0
citations

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

CVPR 2025
0
citations

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

CVPR 2025
0
citations

Parameterized Blur Kernel Prior Learning for Local Motion Deblurring

CVPR 2025
0
citations

Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes

CVPR 2025
0
citations

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

ICCV 2025
0
citations

Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer

ICCV 2025
0
citations

Controllable 3D Outdoor Scene Generation via Scene Graphs

ICCV 2025
0
citations

ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads

ICCV 2025
0
citations

CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition

ICCV 2025
0
citations

Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model

AAAI 2025
0
citations

Training-Free Image Manipulation Localization Using Diffusion Models

AAAI 2025
0
citations

Automated Creation of Reusable and Diverse Toolsets for Enhancing LLM Reasoning

AAAI 2025
0
citations

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

AAAI 2024arXiv
0
citations

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision

AAAI 2024
0
citations

Improving GNN Calibration with Discriminative Ability: Insights and Strategies

AAAI 2024
0
citations

Pushing the Limit of Fine-Tuning for Few-Shot Learning: Where Feature Reusing Meets Cross-Scale Attention

AAAI 2024
0
citations

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

AAAI 2024arXiv
0
citations

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

CVPR 2024
0
citations

SeD: Semantic-Aware Discriminator for Image Super-Resolution

CVPR 2024
0
citations

RTracker: Recoverable Tracking via PN Tree Structured Memory

CVPR 2024
0
citations

KVQ: Kwai Video Quality Assessment for Short-form Videos

CVPR 2024
0
citations

HRVDA: High-Resolution Visual Document Assistant

CVPR 2024
0
citations

HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection

CVPR 2024
0
citations

From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems

ICML 2024
0
citations