Yu Zhang

71

Papers

1,223

Total Citations

Papers (71)

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

NeurIPS 2017arXiv

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

BHViT: Binarized Hybrid Vision Transformer

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images

Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs Without Real Data Replay

MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

Object-level Correlation for Few-Shot Segmentation

Open Your Eyes: Vision Enhances Message Passing Neural Networks in Link Prediction

Semantic Object Segmentation via Detection in Weakly Labeled Video

3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion

Exploit Bounding Box Annotations for Multi-Label Object Recognition

What Is and What Is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors

Causes and Corrections for Bimodal Multi-Path Scanning With Structured Light

Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching

Learning Event-Based Motion Deblurring

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection

Sparse Multi-Path Corrections in Fringe Projection Profilometry

Balanced and Hierarchical Relation Learning for One-Shot Object Detection

AutoMine: An Unmanned Mine Dataset

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration

Leveraging per Image-Token Consistency for Vision-Language Pre-Training

Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation

Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector

Multi-Class Part Parsing With Joint Boundary-Semantic Awareness

Training Weakly Supervised Video Frame Interpolation With Events

Personalized Image Semantic Segmentation

E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images

Learning Trajectory-Word Alignments for Video-Language Tasks

Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields

Multi-view Self-supervised Disentanglement for General Image Denoising

Deep Image Clustering with Category-Style Representation

Learning to See in the Dark with Events

PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry

Deep Bayesian Video Frame Interpolation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

An Efficient Person Clustering Algorithm for Open Checkout-Free Groceries

Selectivity or Invariance: Boundary-Aware Salient Object Detection

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

PLAN: Proactive Low-Rank Allocation for Continual Learning

HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation

Adaptive Wavelet-Positional Encoding for High-Frequency Information Learning in Implicit Neural Representation

Multi-Label Ranking Loss Minimization for Matrix Completion

SDAC: A Multimodal Synthetic Dataset for Anomaly and Corner Case Detection in Autonomous Driving

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

Memory-Efficient Reversible Spiking Neural Networks

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

Rethinking Guidance Information to Utilize Unlabeled Samples: A Label Encoding Perspective

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Learning to Multitask

Multi-Objective Meta Learning

Effective Meta-Regularization by Kernelized Proximal Regularization

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Dynamic Sparse Network for Time Series Classification: Learning What to “See”

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection

Interpreting Unsupervised Anomaly Detection in Security via Rule Extraction

MG-ViT: A Multi-Granularity Method for Compact and Efficient Vision Transformers

Transfer Learning via Learning to Transfer

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis