Zhang

168
Papers
2,855
Total Citations

Papers (168)

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

ICLR 2025arXiv
351
citations

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

ECCV 2024arXiv
180
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

ECCV 2024arXiv
114
citations

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

ECCV 2024arXiv
110
citations

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

ICLR 2025arXiv
101
citations

MoBA: Mixture of Block Attention for Long-Context LLMs

NeurIPS 2025arXiv
94
citations

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

ICLR 2025arXiv
90
citations

PSALM: Pixelwise Segmentation with Large Multi-modal Model

ECCV 2024arXiv
82
citations

WebDancer: Towards Autonomous Information Seeking Agency

NeurIPS 2025arXiv
81
citations

MMTEB: Massive Multilingual Text Embedding Benchmark

ICLR 2025arXiv
74
citations

MagicPIG: LSH Sampling for Efficient LLM Generation

ICLR 2025arXiv
62
citations

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

NeurIPS 2025arXiv
57
citations

Self-Improvement in Language Models: The Sharpening Mechanism

ICLR 2025arXiv
55
citations

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

ICLR 2025arXiv
53
citations

Catastrophic Failure of LLM Unlearning via Quantization

ICLR 2025arXiv
43
citations

Stream Query Denoising for Vectorized HD-Map Construction

ECCV 2024arXiv
40
citations

On the Role of Attention Heads in Large Language Model Safety

ICLR 2025arXiv
40
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

ICLR 2025arXiv
39
citations

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

ICLR 2025arXiv
34
citations

Generalizable Human Gaussians for Sparse View Synthesis

ECCV 2024arXiv
34
citations

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

NeurIPS 2025arXiv
30
citations

Soft Prompt Generation for Domain Generalization

ECCV 2024arXiv
30
citations

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

NeurIPS 2025arXiv
29
citations

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

ICLR 2025arXiv
28
citations

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

NeurIPS 2025arXiv
25
citations

Energy-Weighted Flow Matching for Offline Reinforcement Learning

ICLR 2025arXiv
24
citations

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

ICLR 2025
23
citations

SWE-bench Goes Live!

NeurIPS 2025arXiv
22
citations

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

ICLR 2025arXiv
22
citations

An Incremental Unified Framework for Small Defect Inspection

ECCV 2024arXiv
21
citations

GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering

ICLR 2025arXiv
21
citations

One-Shot Diffusion Mimicker for Handwritten Text Generation

ECCV 2024arXiv
21
citations

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

ICLR 2025arXiv
21
citations

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

NeurIPS 2025arXiv
19
citations

GameArena: Evaluating LLM Reasoning through Live Computer Games

ICLR 2025arXiv
19
citations

SELF-EVOLVED REWARD LEARNING FOR LLMS

ICLR 2025arXiv
18
citations

Implicit Concept Removal of Diffusion Models

ECCV 2024arXiv
18
citations

Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

ECCV 2024arXiv
17
citations

MetaOOD: Automatic Selection of OOD Detection Models

ICLR 2025arXiv
16
citations

Spiking Vision Transformer with Saccadic Attention

ICLR 2025arXiv
15
citations

RoboScape: Physics-informed Embodied World Model

NeurIPS 2025arXiv
15
citations

LeVo: High-Quality Song Generation with Multi-Preference Alignment

NeurIPS 2025arXiv
15
citations

MoVideo: Motion-Aware Video Generation with Diffusion Models

ECCV 2024arXiv
14
citations

GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity

ECCV 2024arXiv
14
citations

Quantized Spike-driven Transformer

ICLR 2025arXiv
14
citations

NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering

NeurIPS 2025arXiv
14
citations

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

NeurIPS 2025arXiv
14
citations

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

NeurIPS 2025arXiv
13
citations

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

ICLR 2025arXiv
13
citations

UFM: A Simple Path towards Unified Dense Correspondence with Flow

NeurIPS 2025arXiv
13
citations

SINDER: Repairing the Singular Defects of DINOv2

ECCV 2024arXiv
12
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

ICLR 2025arXiv
12
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

ECCV 2024arXiv
11
citations

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

NeurIPS 2025arXiv
11
citations

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

ICLR 2025arXiv
11
citations

Monocular Occupancy Prediction for Scalable Indoor Scenes

ECCV 2024arXiv
11
citations

OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework

ECCV 2024arXiv
11
citations

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

NeurIPS 2025arXiv
10
citations

MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment

ECCV 2024arXiv
10
citations

Few-shot NeRF by Adaptive Rendering Loss Regularization

ECCV 2024arXiv
10
citations

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

ICLR 2025arXiv
10
citations

Test-time Adaptation for Cross-modal Retrieval with Query Shift

ICLR 2025arXiv
9
citations

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

ECCV 2024arXiv
9
citations

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

ECCV 2024arXiv
9
citations

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

ICLR 2025arXiv
9
citations

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

NeurIPS 2025arXiv
9
citations

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

NeurIPS 2025arXiv
9
citations

Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

ECCV 2024arXiv
8
citations

Causally Motivated Sycophancy Mitigation for Large Language Models

ICLR 2025
8
citations

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

ECCV 2024arXiv
8
citations

What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context

ICLR 2025arXiv
7
citations

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

ICLR 2025arXiv
7
citations

Learning Cross-hand Policies of High-DOF Reaching and Grasping

ECCV 2024arXiv
7
citations

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

ICLR 2025arXiv
7
citations

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

ECCV 2024arXiv
6
citations

Occlusion-Aware Seamless Segmentation

ECCV 2024arXiv
6
citations

Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection

ECCV 2024arXiv
6
citations

LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang

ECCV 2024
6
citations

DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement

ECCV 2024arXiv
6
citations

MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

NeurIPS 2025arXiv
6
citations

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

ICLR 2025arXiv
6
citations

Integrative Decoding: Improving Factuality via Implicit Self-consistency

ICLR 2025arXiv
6
citations

ELICIT: LLM Augmentation Via External In-context Capability

ICLR 2025arXiv
6
citations

GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

ICLR 2025arXiv
6
citations

SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision

ICLR 2025arXiv
5
citations

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

NeurIPS 2025arXiv
5
citations

Hessian-Free Online Certified Unlearning

ICLR 2025arXiv
5
citations

Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

ECCV 2024arXiv
5
citations

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

ECCV 2024arXiv
5
citations

Learning Graph Invariance by Harnessing Spuriosity

ICLR 2025
5
citations

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

ICLR 2025arXiv
5
citations

Noisy Test-Time Adaptation in Vision-Language Models

ICLR 2025arXiv
4
citations

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

NeurIPS 2025arXiv
4
citations

Dynamic Risk Assessments for Offensive Cybersecurity Agents

NeurIPS 2025arXiv
4
citations

SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups

ICLR 2025arXiv
4
citations

Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers

ECCV 2024
4
citations

Estimation and Inference in Distributional Reinforcement Learning

NeurIPS 2025arXiv
4
citations

Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

ECCV 2024arXiv
3
citations

Attention! Your Vision Language Model Could Be Maliciously Manipulated

NeurIPS 2025arXiv
3
citations

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

NeurIPS 2025arXiv
3
citations

MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation

NeurIPS 2025arXiv
3
citations

Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

NeurIPS 2025arXiv
3
citations

Homomorphism Expressivity of Spectral Invariant Graph Neural Networks

ICLR 2025arXiv
3
citations

STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization

NeurIPS 2025arXiv
3
citations

RLZero: Direct Policy Inference from Language Without In-Domain Supervision

NeurIPS 2025arXiv
3
citations

CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling

NeurIPS 2025arXiv
3
citations

Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space

NeurIPS 2025arXiv
3
citations

RESAnything: Attribute Prompting for Arbitrary Referring Segmentation

NeurIPS 2025arXiv
2
citations

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation

ICLR 2025arXiv
2
citations

A Statistical Approach for Controlled Training Data Detection

ICLR 2025
2
citations

One Filters All: A Generalist Filter For State Estimation

NeurIPS 2025arXiv
2
citations

Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents

NeurIPS 2025arXiv
2
citations

BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks

NeurIPS 2025arXiv
2
citations

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

NeurIPS 2025arXiv
2
citations

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

ICLR 2025arXiv
2
citations

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

NeurIPS 2025arXiv
2
citations

OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

ECCV 2024arXiv
2
citations

See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction

NeurIPS 2025arXiv
2
citations

Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

NeurIPS 2025
2
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

ICLR 2025arXiv
2
citations

Test-time Adaptation for Image Compression with Distribution Regularization

ICLR 2025arXiv
2
citations

A Conditional Independence Test in the Presence of Discretization

ICLR 2025arXiv
2
citations

Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis

ICLR 2025arXiv
2
citations

Alignment of Large Language Models with Constrained Learning

NeurIPS 2025arXiv
2
citations

S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning

NeurIPS 2025arXiv
2
citations

PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph

ICLR 2025arXiv
1
citations

MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations

ICLR 2025
1
citations

OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps

NeurIPS 2025arXiv
1
citations

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

ICLR 2025
1
citations

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

NeurIPS 2025arXiv
1
citations

Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning

NeurIPS 2025arXiv
1
citations

Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving

NeurIPS 2025arXiv
1
citations

Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation

NeurIPS 2025arXiv
1
citations

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

NeurIPS 2025arXiv
1
citations

Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling

NeurIPS 2025arXiv
1
citations

Dependency-aware Differentiable Neural Architecture Search

ECCV 2024
1
citations

Controlled LLM Decoding via Discrete Auto-regressive Biasing

ICLR 2025arXiv
1
citations

Two‑Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion

NeurIPS 2025arXiv
1
citations

Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization

NeurIPS 2025arXiv
1
citations

Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining

NeurIPS 2025arXiv
1
citations

FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network

NeurIPS 2025arXiv
0
citations

Faithful Group Shapley Value

NeurIPS 2025arXiv
0
citations

Variational Task Vector Composition

NeurIPS 2025arXiv
0
citations

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

NeurIPS 2025arXiv
0
citations

Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning

NeurIPS 2025arXiv
0
citations

Stop DDoS Attacking the Research Community with AI-Generated Survey Papers

NeurIPS 2025arXiv
0
citations

Probing Neural Combinatorial Optimization Models

NeurIPS 2025arXiv
0
citations

ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation

NeurIPS 2025arXiv
0
citations

PID-controlled Langevin Dynamics for Faster Sampling on Generative Models

NeurIPS 2025arXiv
0
citations

Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation

NeurIPS 2025arXiv
0
citations

Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data

NeurIPS 2025arXiv
0
citations

Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

NeurIPS 2025arXiv
0
citations

Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning

NeurIPS 2025arXiv
0
citations

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

NeurIPS 2025arXiv
0
citations

NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval

NeurIPS 2025arXiv
0
citations

EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis

NeurIPS 2025arXiv
0
citations

FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning

NeurIPS 2025arXiv
0
citations

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

NeurIPS 2025arXiv
0
citations

OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction

NeurIPS 2025arXiv
0
citations

Order-Level Attention Similarity Across Language Models: A Latent Commonality

NeurIPS 2025arXiv
0
citations

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

NeurIPS 2025arXiv
0
citations

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

ICLR 2025
0
citations

mmWalk: Towards Multi-modal Multi-view Walking Assistance

NeurIPS 2025arXiv
0
citations

Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset

ICLR 2025arXiv
0
citations

F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning

NeurIPS 2025arXiv
0
citations

MuSLR: Multimodal Symbolic Logical Reasoning

NeurIPS 2025arXiv
0
citations

AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

NeurIPS 2025arXiv
0
citations