Yang Liu

112
Papers
1,240
Total Citations
2
Affiliations

Affiliations

school of computer science and technologyHarbin institute of technology

Papers (112)

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

ICLR 2024
309
citations

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

CVPR 2024
237
citations

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

ICLR 2025
115
citations

Space Group Constrained Crystal Generation

ICLR 2024
60
citations

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

CVPR 2024
49
citations

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

CVPR 2024
32
citations

Exploring Enhanced Contextual Information for Video-Level Object Tracking

AAAI 2025
27
citations

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

AAAI 2025
27
citations

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

AAAI 2024arXiv
26
citations

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

ICLR 2024
26
citations

Perception-Guided Jailbreak Against Text-to-Image Models

AAAI 2025
26
citations

Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning

ICML 2025
22
citations

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

CVPR 2024
20
citations

FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

AAAI 2024arXiv
19
citations

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

NeurIPS 2025
18
citations

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

CVPR 2024
18
citations

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

CVPR 2024
17
citations

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

ICLR 2025
13
citations

Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts

AAAI 2024arXiv
12
citations

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

NeurIPS 2025
12
citations

ZeroFlow: Scalable Scene Flow via Distillation

ICLR 2024
12
citations

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

CVPR 2025
11
citations

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning

CVPR 2025
10
citations

Post-hoc bias scoring is optimal for fair classification

ICLR 2024
10
citations

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

CVPR 2025
9
citations

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

CVPR 2024
9
citations

Cross-modal Causal Relation Alignment for Video Question Grounding

CVPR 2025
7
citations

Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion

ICML 2025
7
citations

ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer

CVPR 2025
7
citations

Dynamic Graph Learning with Static Relations for Credit Risk Assessment

AAAI 2025
6
citations

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment

AAAI 2025
6
citations

Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach

AAAI 2025
6
citations

Novel Class Discovery in Chest X-rays via Paired Images and Text

AAAI 2024
5
citations

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

AAAI 2024arXiv
5
citations

Adversarial Robust Memory-Based Continual Learner

ICCV 2025
5
citations

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

CVPR 2025
4
citations

Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

CVPR 2025
4
citations

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges

ICCV 2025
3
citations

CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models

CVPR 2025
3
citations

Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval

AAAI 2025
3
citations

Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning

AAAI 2025
3
citations

WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation

CVPR 2025
3
citations

Hybrid Concept Bottleneck Models

CVPR 2025
2
citations

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

ICCV 2025arXiv
2
citations

AdsQA: Towards Advertisement Video Understanding

ICCV 2025
2
citations

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

NeurIPS 2025
2
citations

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

NeurIPS 2025
2
citations

InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

ICLR 2025
1
citations

Exploring Structural Degradation in Dense Representations for Self-supervised Learning

NeurIPS 2025
1
citations

Fair Participation via Sequential Policies

AAAI 2024
1
citations

Learning Counterfactual Outcomes Under Rank Preservation

NeurIPS 2025
1
citations

HSI: A Holistic Style Injector for Arbitrary Style Transfer

CVPR 2025
1
citations

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

AAAI 2024arXiv
1
citations

UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules

ICML 2025
1
citations

MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models

NeurIPS 2025
0
citations

Human and AI Perceptual Differences in Image Classification Errors

AAAI 2025
0
citations

Mesh Interpolation Graph Network for Dynamic and Spatially Irregular Global Weather Forecasting

NeurIPS 2025
0
citations

FastPERT: Towards Fast Microservice Application Latency Prediction via Structural Inductive Bias over PERT Networks

AAAI 2025
0
citations

Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?

AAAI 2025
0
citations

S^3cMath: Spontaneous Step-Level Self-Correction Makes Large Language Models Better Mathematical Reasoners

AAAI 2025
0
citations

Unified Open-World Segmentation with Multi-Modal Prompts

ICCV 2025
0
citations

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for

AAAI 2024
0
citations

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

ICCV 2025
0
citations

Comprehensive Visual Grounding for Video Description

AAAI 2024
0
citations

FedMut: Generalized Federated Learning via Stochastic Mutation

AAAI 2024
0
citations

Knowledge Graph Error Detection with Contrastive Confidence Adaption

AAAI 2024
0
citations

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

ICCV 2025
0
citations

Multi-scenario Overlapping Text Segmentation with Depth Awareness

ICCV 2025
0
citations

End-to-End Driving with Online Trajectory Evaluation via BEV World Model

ICCV 2025
0
citations

Semantic-Guided Novel Category Discovery

AAAI 2024
0
citations

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes

ICCV 2025
0
citations

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

ICCV 2025
0
citations

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

CVPR 2024
0
citations

Diff-BGM: A Diffusion Model for Video Background Music Generation

CVPR 2024
0
citations

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

CVPR 2024
0
citations

DisTime: Distribution-based Time Representation for Video Large Language Models

ICCV 2025
0
citations

Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching

ICCV 2025
0
citations

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

CVPR 2024
0
citations

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

CVPR 2024
0
citations

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

CVPR 2024
0
citations

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

ICCV 2025arXiv
0
citations

GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing

ICCV 2025
0
citations

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

ICCV 2025
0
citations

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

CVPR 2024
0
citations

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

CVPR 2024
0
citations

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

ICCV 2025
0
citations

AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning

ICCV 2025
0
citations

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

ICCV 2025
0
citations

Learning Visual Proxy for Compositional Zero-Shot Learning

ICCV 2025
0
citations

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

CVPR 2025
0
citations

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

CVPR 2025
0
citations

Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints

ICML 2025
0
citations

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

CVPR 2025
0
citations

Position: Towards Unified Alignment Between Agents, Humans, and Environment

ICML 2024
0
citations

Performative Prediction with Bandit Feedback: Learning through Reparameterization

ICML 2024
0
citations

Multi-View Clustering by Inter-cluster Connectivity Guided Reward

ICML 2024
0
citations

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

ICML 2024
0
citations

Graph Distillation with Eigenbasis Matching

ICML 2024
0
citations

Semantic-Aware Human Object Interaction Image Generation

ICML 2024
0
citations

MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

ICML 2024
0
citations

Neural Jump-Diffusion Temporal Point Processes

ICML 2024
0
citations

Generative Active Learning for Long-tailed Instance Segmentation

ICML 2024
0
citations

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

ICML 2024
0
citations

Equivariant Diffusion for Crystal Structure Prediction

ICML 2024
0
citations

DOF-Separation for 3D Manipulation in XR: Understanding Finger-Wrist Separation to Simultaneously Translate and Rotate Objects

ISMAR 2025
0
citations

Improving Neural Logic Machines via Failure Reflection

ICML 2024
0
citations

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes

AAAI 2025
0
citations

Cross-Subject Cognitive Load Recognition in VR Using Multimodal Fusion with EEG and Eye-Tracking

ISMAR 2025
0
citations

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

AAAI 2025
0
citations

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

AAAI 2025
0
citations

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

AAAI 2025
0
citations

PlanLLM: Video Procedure Planning with Refinable Large Language Models

AAAI 2025
0
citations