Peng Wang

101

Papers

1,823

Total Citations

Papers (101)

MVDream: Multi-view Diffusion for 3D Generation

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

SURGE: Surface Regularized Geometry Estimation from a Single Image

Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks

Open-Vocabulary Video Anomaly Detection

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Towards Continual Knowledge Graph Embedding via Incremental Distillation

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

COCONut: Modernizing COCO Segmentation

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling

PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation

Unify Named Entity Recognition Scenarios via Contrastive Real-Time Updating Prototype

Attention-Only Transformers via Unrolled Subspace Denoising

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Generalized Neural Collapse for a Large Number of Classes

Symmetric Matrix Completion with ReLU Sampling

Image Fusion via Vision-Language Model

The Emergence of Reproducibility and Consistency in Diffusion Models

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

A Global Geometric Analysis of Maximal Coding Rate Reduction

Towards Unified Depth and Semantic Prediction From a Single Image

Efficient SDP Inference for Fully-Connected CRFs Based on Low-Rank Decomposition

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

Multi-Attention Network for One Shot Learning

Joint Multi-Person Pose Estimation and Semantic Part Segmentation

LEGO: Learning Edge With Geometry All at Once by Watching Videos

MaskLab: Instance Segmentation by Refining Object Detection With Semantic and Direction Features

View Extrapolation of Human Body From a Single Image

Occlusion Aware Unsupervised Learning of Optical Flow

DeLS-3D: Deep Localization and Segmentation With a 3D Semantic Map

Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning

Visual Question Answering With Memory-Augmented Networks

Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks

Multi-Label Image Recognition With Graph Convolutional Networks

ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving

Visual Question Answering as Reading Comprehension

UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos

Anisotropic Convolutional Networks for 3D Semantic Scene Completion

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension

NAS-FCOS: Fast Neural Architecture Search for Object Detection

3D Part Guided Image Editing for Fine-Grained Object Understanding

Contrastive Learning Based Hybrid Networks for Long-Tailed Image Classification

HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers

Neural Rays for Occlusion-Aware Image-Based Rendering

Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization

Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification

Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer

NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night

HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation

Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation

BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

Learning Conditional Attributes for Compositional Zero-Shot Learning

Glocal Energy-Based Learning for Few-Shot Open-Set Recognition

Unlocking Generalization Power in LiDAR Point Cloud Registration

NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies

A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

Joint Object and Part Segmentation Using Deep Learned Potentials

Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks

Vehicle Re-Identification in Aerial Imagery: Dataset and Approach

Continual Neural Mapping: Learning an Implicit Scene Representation From Sequential Observations

AerialVLN: Vision-and-Language Navigation for UAVs

Batch-based Model Registration for Fast 3D Sherd Reconstruction

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Dynamically Transformed Instance Normalization Network for Generalizable Person Re-identification

Levenshtein OCR

Multi-Granularity Prediction for Scene Text Recognition

NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors

SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views

DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization

A Simple and Robust Correlation Filtering Method for Text-Based Person Search

F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories

SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Dual Diffusion for Unified Image Generation and Understanding

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association

RayZer: A Self-supervised Large View Synthesis Model

A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness

Implicit Counterfactual Learning for Audio-Visual Segmentation

Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval

A Lightweight Sparse Interaction Network for Time Series Forecasting

OntoFact: Unveiling Fantastic Fact-Skeleton of LLMs via Ontology-Driven Reinforcement Learning

ConsistNER: Towards Instructive NER Demonstrations for LLMs with the Consistency of Ontology and Context

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing