Jian Yang

63

Papers

485

Total Citations

Papers (63)

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

OmniBench: Towards The Future of Universal Omni-Language Models

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

McEval: Massively Multilingual Code Evaluation

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

From Words to Worth: Newborn Article Impact Prediction with LLM

RNG: Relightable Neural Gaussians

Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Fundamental Matrix Estimation Using Relative Depths

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model

SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model

Relaxed Rotational Equivariance via G-Biases in Vision

Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning

Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection

Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation

Reverse Convolution and Its Applications to Image Restoration

Towards Better Spherical Sliced-Wasserstein Distance Learning with Data-Adaptive Discriminative Projection Direction

Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning

From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective

Fine-Tuning Language Models with Collaborative and Semantic Experts

MCL-NER: Cross-Lingual Named Entity Recognition via Multi-View Contrastive Learning

SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Hyperbolic Graph Diffusion Model

Divide and Conquer: Hybrid Pre-training for Person Search

SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-resolution

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Multi-Attribute Interactions Matter for 3D Visual Grounding

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance

LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Generative Point Cloud Registration

Sketchy Bounding-box Supervision for 3D Instance Segmentation

HORP: Human-Object Relation Priors Guided HOI Detection

Three-view Focal Length Recovery From Homographies

DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Straighten Viscous Rectified Flow via Noise Optimization

GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views

RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation

OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios

WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion

Dual Manifold Regularization Steered Robust Representation Learning for Point Cloud Analysis

Harmonious Music-driven Group Choreography with Trajectory-Controllable Diffusion

One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models