Papers (52)
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
ICLR 2024
1,366
citations
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
CVPR 2024
449
citations
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
ICCV 2025
58
citations
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025
40
citations
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
CVPR 2024
37
citations
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
ICLR 2025arXiv
36
citations
Multi-Space Alignments Towards Universal LiDAR Segmentation
CVPR 2024
30
citations
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
CVPR 2024
29
citations
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
NeurIPS 2025
26
citations
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
CVPR 2025arXiv
17
citations
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
CVPR 2024
16
citations
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
CVPR 2024
16
citations
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
CVPR 2024
13
citations
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
ICLR 2025
12
citations
Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
ECCV 2024arXiv
10
citations
MobileInst: Video Instance Segmentation on the Mobile
AAAI 2024arXiv
10
citations
Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation
AAAI 2024arXiv
10
citations
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
CVPR 2025
10
citations
CADDreamer: CAD Object Generation from Single-view Images
CVPR 2025
9
citations
Inverse Weight-Balancing for Deep Long-Tailed Learning
AAAI 2024
7
citations
TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
AAAI 2025
4
citations
MetaCARD: Meta-Reinforcement Learning with Task Uncertainty Feedback via Decoupled Context-Aware Reward and Dynamics Components
AAAI 2024
4
citations
Symbolic Neural Ordinary Differential Equations
AAAI 2025
3
citations
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
CVPR 2025
2
citations
MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
ECCV 2024
2
citations
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
ICML 2024
0
citations
Learning Latent Dynamic Robust Representations for World Models
ICML 2024
0
citations
A Unified Adaptive Testing System Enabled by Hierarchical Structure Search
ICML 2024
0
citations
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
CVPR 2025
0
citations
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
CVPR 2025
0
citations
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
CVPR 2025
0
citations
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
CVPR 2025
0
citations
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
ICCV 2025
0
citations
Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer
ICCV 2025
0
citations
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
0
citations
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
ICCV 2025
0
citations
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
ICCV 2025
0
citations
Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model
AAAI 2025
0
citations
Training-Free Image Manipulation Localization Using Diffusion Models
AAAI 2025
0
citations
Automated Creation of Reusable and Diverse Toolsets for Enhancing LLM Reasoning
AAAI 2025
0
citations
Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection
AAAI 2024arXiv
0
citations
Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision
AAAI 2024
0
citations
Improving GNN Calibration with Discriminative Ability: Insights and Strategies
AAAI 2024
0
citations
Pushing the Limit of Fine-Tuning for Few-Shot Learning: Where Feature Reusing Meets Cross-Scale Attention
AAAI 2024
0
citations
SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking
AAAI 2024arXiv
0
citations
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
CVPR 2024
0
citations
SeD: Semantic-Aware Discriminator for Image Super-Resolution
CVPR 2024
0
citations
RTracker: Recoverable Tracking via PN Tree Structured Memory
CVPR 2024
0
citations
KVQ: Kwai Video Quality Assessment for Short-form Videos
CVPR 2024
0
citations
HRVDA: High-Resolution Visual Document Assistant
CVPR 2024
0
citations
HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection
CVPR 2024
0
citations
From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems
ICML 2024
0
citations