Yaowei Wang

49

Papers

247

Total Citations

Papers (49)

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues

Video Language Model Pretraining with Spatio-temporal Masking

RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model

Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression

Unsupervised Degradation Representation Aware Transform for Real-World Blind Image Super-Resolution

Pilot: Building the Federated Multimodal Instruction Tuning Framework

Mixed-Effects Contextual Bandits

RTracker: Recoverable Tracking via PN Tree Structured Memory

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

Modality-Collaborative Test-Time Adaptation for Action Recognition

Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering

Unsupervised Cross-Dataset Transfer Learning for Person Re-Identification

Contrastive Neural Architecture Search With Neural Architecture Comparators

Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression

Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark

Fine-Grained Object Classification via Self-Supervised Pose Alignment

Boosting Crowd Counting via Multifaceted Attention

M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining

Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples

Integrally Pre-Trained Transformer Pyramid Networks

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection

Exploiting Multi-Grain Ranking Constraints for Precisely Searching Visually-Similar Vehicles

Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network

Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning

Conformer: Local Features Coupling Global Representations for Visual Recognition

Strip-MLP: Efficient Token Interaction for Vision MLP

CiteTracker: Correlating Image and Text for Visual Tracking

Large Batch Optimization for Object Detection: Training COCO in 12 Minutes

An Asymmetric Modeling for Action Assessment

Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance

DAS: Densely-Anchored Sampling for Deep Metric Learning

CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection

AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Building Vision Models upon Heat Conduction

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval

Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation

Learning to Share in Networked Multi-Agent Reinforcement Learning

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation