Yang Liu

112

Papers

1,240

Total Citations

2

Affiliations

Affiliations

school of computer science and technologyHarbin institute of technology

Papers (112)

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Space Group Constrained Crystal Generation

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Exploring Enhanced Contextual Information for Video-Level Object Tracking

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Perception-Guided Jailbreak Against Text-to-Image Models

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

ZeroFlow: Scalable Scene Flow via Distillation

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning

Post-hoc bias scoring is optimal for fair classification

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Cross-modal Causal Relation Alignment for Video Question Grounding

Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion

ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer

Dynamic Graph Learning with Static Relations for Credit Risk Assessment

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment

Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach

Novel Class Discovery in Chest X-rays via Paired Images and Text

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

Adversarial Robust Memory-Based Continual Learner

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models

Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval

Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning

WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation

Hybrid Concept Bottleneck Models

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

AdsQA: Towards Advertisement Video Understanding

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

Exploring Structural Degradation in Dense Representations for Self-supervised Learning

Fair Participation via Sequential Policies

Learning Counterfactual Outcomes Under Rank Preservation

HSI: A Holistic Style Injector for Arbitrary Style Transfer

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models

Human and AI Perceptual Differences in Image Classification Errors

Mesh Interpolation Graph Network for Dynamic and Spatially Irregular Global Weather Forecasting

FastPERT: Towards Fast Microservice Application Latency Prediction via Structural Inductive Bias over PERT Networks

Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?

S^3cMath: Spontaneous Step-Level Self-Correction Makes Large Language Models Better Mathematical Reasoners

Unified Open-World Segmentation with Multi-Modal Prompts

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Comprehensive Visual Grounding for Video Description

FedMut: Generalized Federated Learning via Stochastic Mutation

Knowledge Graph Error Detection with Contrastive Confidence Adaption

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Multi-scenario Overlapping Text Segmentation with Depth Awareness

End-to-End Driving with Online Trajectory Evaluation via BEV World Model

Semantic-Guided Novel Category Discovery

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Diff-BGM: A Diffusion Model for Video Background Music Generation

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

DisTime: Distribution-based Time Representation for Video Large Language Models

Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Learning Visual Proxy for Compositional Zero-Shot Learning

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

Position: Towards Unified Alignment Between Agents, Humans, and Environment

Performative Prediction with Bandit Feedback: Learning through Reparameterization

Multi-View Clustering by Inter-cluster Connectivity Guided Reward

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

Graph Distillation with Eigenbasis Matching

Semantic-Aware Human Object Interaction Image Generation

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

Neural Jump-Diffusion Temporal Point Processes

Generative Active Learning for Long-tailed Instance Segmentation

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Equivariant Diffusion for Crystal Structure Prediction

DOF-Separation for 3D Manipulation in XR: Understanding Finger-Wrist Separation to Simultaneously Translate and Rotate Objects

Improving Neural Logic Machines via Failure Reflection

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

Cross-Subject Cognitive Load Recognition in VR Using Multimodal Fusion with EEG and Eye-Tracking

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

PlanLLM: Video Procedure Planning with Refinable Large Language Models