Jiaya Jia

128

Papers

1,266

Total Citations

Papers (128)

LISA: Reasoning Segmentation via Large Language Model

Video-P2P: Video Editing with Cross-attention Control

Visual Question Answering with Question Representation Update (QRU)

Unified Language-driven Zero-shot Domain Adaptation

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Generative Video Propagation

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

DreamOmni: Unified Image Generation and Editing

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

Facelet-Bank for Fast Portrait Manipulation

Referring Image Segmentation via Recurrent Refinement Networks

Scale-Recurrent Network for Deep Image Deblurring

Path Aggregation Network for Instance Segmentation

Semi-Parametric Image Synthesis

Wide-Context Semantic Image Extrapolation

Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation

Amodal Instance Segmentation With KINS Dataset

Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections

Associatively Segmenting Instances and Semantics in Point Clouds

Learning Shape-Aware Embedding for Scene Text Detection

PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing

Underexposed Photo Enhancement Using Deep Illumination Estimation

3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis

Semantic Component Decomposition for Face Attribute Manipulation

Domain Adaptive Image-to-Image Translation

Attentive Normalization for Conditional Image Generation

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

Exploring Self-Attention for Image Recognition

DSGN: Deep Stereo Geometry Network for 3D Object Detection

3DSSD: Point-Based 3D Single Stage Object Detector

Distilling Knowledge via Knowledge Review

Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency

Jigsaw Clustering for Unsupervised Visual Representation Learning

Self-Supervised 3D Mesh Reconstruction From Single Images

Multi-Scale Aligned Distillation for Low-Resolution Detection

Scale-Aware Automatic Augmentation for Object Detection

Fully Convolutional Networks for Panoptic Segmentation

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution

Bidirectional Projection Network for Cross Dimension Scene Understanding

Improving Calibration for Long-Tailed Recognition

TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation

EfficientNeRF Efficient Neural Radiance Fields

Voxel Field Fusion for 3D Object Detection

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

A Unified Query-Based Paradigm for Point Cloud Understanding

Generalized Few-Shot Semantic Segmentation

Video Frame Interpolation With Transformer

Focal Sparse Convolutional Networks for 3D Object Detection

Multi-View Transformer for 3D Visual Grounding

High Quality Segmentation for Ultra High-Resolution Images

SNR-Aware Low-Light Image Enhancement

Stratified Transformer for 3D Point Cloud Segmentation

Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

Spherical Transformer for LiDAR-Based 3D Recognition

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs

TriVol: Point Cloud Rendering via Triple Volumes

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

Hierarchical Dense Correlation Distillation for Few-Shot Segmentation

Command-Driven Articulated Object Understanding and Manipulation

Video Super-Resolution via Deep Draft-Ensemble Learning

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Box Aggregation for Proposal Decimation: Last Mile of Object Detection

Semantic Segmentation With Object Clique Potential

Understanding and Diagnosing Visual Tracking Systems

Mutual-Structure for Joint Filtering

Zero-Order Reverse Filtering

Unsupervised Learning of Stereo Matching

High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits

SGN: Sequential Grouping Networks for Instance Segmentation

Situation Recognition With Graph Neural Networks

Detail-Revealing Deep Video Super-Resolution

Makeup-Go: Blind Reversion of Portrait Edit

3D Graph Neural Networks for RGBD Semantic Segmentation

STD: Sparse-to-Dense 3D Object Detector for Point Cloud

Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation

AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation

Attribute-Driven Spontaneous Motion in Unpaired Image Translation

Fast and Practical Neural Architecture Search

View Independent Generative Adversarial Network for Novel View Synthesis

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

Image Synthesis via Semantic Composition

Guided Point Contrastive Learning for Semi-Supervised Point Cloud Semantic Segmentation

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

Deep Structured Instance Graph for Distilling Object Detectors

Video Instance Segmentation With a Propose-Reduce Paradigm

Learnable Boundary Guided Adversarial Training

Point Transformer

Seeing Dynamic Scene in the Dark: A High-Quality Video Dataset With Mechatronic Alignment

Parametric Contrastive Learning

Removing Anomalies as Noises for Industrial Defect Localization

Mask-Attention-Free Transformer for 3D Instance Segmentation

End-to-end 3D Tracking with Decoupled Queries

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

High Quality Entity Segmentation

Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

CN: Channel Normalization For Point Cloud Recognition

Memory Selection Network for Video Propagation

VCNet: A Robust Approach to Blind Image Inpainting

Tracking Objects As Pixel-Wise Distributions

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

Fast Point R-CNN

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Just Noticeable Defocus Blur Detection and Estimation

Deep LAC: Deep Localization, Alignment and Classification for Fine-Grained Recognition

Handling Motion Blur in Multi-Frame Super-Resolution

Multi-Scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

Pyramid Scene Parsing Network

Image Inpainting via Generative Multi-column Convolutional Neural Networks

Sequential Context Encoding for Duplicate Removal

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond

Blending Anti-Aliasing into Vision Transformer

Unifying Voxel-based Representation with Transformer for 3D Object Detection

Real-World Image Variation by Aligning Diffusion Inversion Chain

DiffComplete: Diffusion-based Generative 3D Shape Completion

Deep Edge-Aware Filters