Peng Wang

39

Papers

1,640

Total Citations

Papers (39)

MVDream: Multi-view Diffusion for 3D Generation

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Open-Vocabulary Video Anomaly Detection

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Towards Continual Knowledge Graph Embedding via Incremental Distillation

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

COCONut: Modernizing COCO Segmentation

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

NeurIPS 2025arXiv

PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation

Unify Named Entity Recognition Scenarios via Contrastive Real-Time Updating Prototype

Attention-Only Transformers via Unrolled Subspace Denoising

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

ConsistNER: Towards Instructive NER Demonstrations for LLMs with the Consistency of Ontology and Context

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Generalized Neural Collapse for a Large Number of Classes

Symmetric Matrix Completion with ReLU Sampling

Image Fusion via Vision-Language Model

The Emergence of Reproducibility and Consistency in Diffusion Models

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Unlocking Generalization Power in LiDAR Point Cloud Registration

A Global Geometric Analysis of Maximal Coding Rate Reduction

SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Dual Diffusion for Unified Image Generation and Understanding

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association

RayZer: A Self-supervised Large View Synthesis Model

A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness

Implicit Counterfactual Learning for Audio-Visual Segmentation

Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval

A Lightweight Sparse Interaction Network for Time Series Forecasting

OntoFact: Unveiling Fantastic Fact-Skeleton of LLMs via Ontology-Driven Reinforcement Learning