Zhang
168
Papers
2,855
Total Citations
Papers (168)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
ICLR 2025arXiv
351
citations
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
ECCV 2024arXiv
180
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
ECCV 2024arXiv
114
citations
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
ECCV 2024arXiv
110
citations
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
ICLR 2025arXiv
101
citations
MoBA: Mixture of Block Attention for Long-Context LLMs
NeurIPS 2025arXiv
94
citations
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
ICLR 2025arXiv
90
citations
PSALM: Pixelwise Segmentation with Large Multi-modal Model
ECCV 2024arXiv
82
citations
WebDancer: Towards Autonomous Information Seeking Agency
NeurIPS 2025arXiv
81
citations
MMTEB: Massive Multilingual Text Embedding Benchmark
ICLR 2025arXiv
74
citations
MagicPIG: LSH Sampling for Efficient LLM Generation
ICLR 2025arXiv
62
citations
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
NeurIPS 2025arXiv
57
citations
Self-Improvement in Language Models: The Sharpening Mechanism
ICLR 2025arXiv
55
citations
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
ICLR 2025arXiv
53
citations
Catastrophic Failure of LLM Unlearning via Quantization
ICLR 2025arXiv
43
citations
Stream Query Denoising for Vectorized HD-Map Construction
ECCV 2024arXiv
40
citations
On the Role of Attention Heads in Large Language Model Safety
ICLR 2025arXiv
40
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
ICLR 2025arXiv
39
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
ICLR 2025arXiv
34
citations
Generalizable Human Gaussians for Sparse View Synthesis
ECCV 2024arXiv
34
citations
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
NeurIPS 2025arXiv
30
citations
Soft Prompt Generation for Domain Generalization
ECCV 2024arXiv
30
citations
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
NeurIPS 2025arXiv
29
citations
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
ICLR 2025arXiv
28
citations
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation
NeurIPS 2025arXiv
25
citations
Energy-Weighted Flow Matching for Offline Reinforcement Learning
ICLR 2025arXiv
24
citations
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
ICLR 2025
23
citations
SWE-bench Goes Live!
NeurIPS 2025arXiv
22
citations
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
ICLR 2025arXiv
22
citations
An Incremental Unified Framework for Small Defect Inspection
ECCV 2024arXiv
21
citations
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
ICLR 2025arXiv
21
citations
One-Shot Diffusion Mimicker for Handwritten Text Generation
ECCV 2024arXiv
21
citations
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
ICLR 2025arXiv
21
citations
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
NeurIPS 2025arXiv
19
citations
GameArena: Evaluating LLM Reasoning through Live Computer Games
ICLR 2025arXiv
19
citations
SELF-EVOLVED REWARD LEARNING FOR LLMS
ICLR 2025arXiv
18
citations
Implicit Concept Removal of Diffusion Models
ECCV 2024arXiv
18
citations
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
ECCV 2024arXiv
17
citations
MetaOOD: Automatic Selection of OOD Detection Models
ICLR 2025arXiv
16
citations
Spiking Vision Transformer with Saccadic Attention
ICLR 2025arXiv
15
citations
RoboScape: Physics-informed Embodied World Model
NeurIPS 2025arXiv
15
citations
LeVo: High-Quality Song Generation with Multi-Preference Alignment
NeurIPS 2025arXiv
15
citations
MoVideo: Motion-Aware Video Generation with Diffusion Models
ECCV 2024arXiv
14
citations
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
ECCV 2024arXiv
14
citations
Quantized Spike-driven Transformer
ICLR 2025arXiv
14
citations
NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering
NeurIPS 2025arXiv
14
citations
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
NeurIPS 2025arXiv
14
citations
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
NeurIPS 2025arXiv
13
citations
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
ICLR 2025arXiv
13
citations
UFM: A Simple Path towards Unified Dense Correspondence with Flow
NeurIPS 2025arXiv
13
citations
SINDER: Repairing the Singular Defects of DINOv2
ECCV 2024arXiv
12
citations
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
ICLR 2025arXiv
12
citations
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
ECCV 2024arXiv
11
citations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
NeurIPS 2025arXiv
11
citations
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
ICLR 2025arXiv
11
citations
Monocular Occupancy Prediction for Scalable Indoor Scenes
ECCV 2024arXiv
11
citations
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
ECCV 2024arXiv
11
citations
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
NeurIPS 2025arXiv
10
citations
MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
ECCV 2024arXiv
10
citations
Few-shot NeRF by Adaptive Rendering Loss Regularization
ECCV 2024arXiv
10
citations
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
ICLR 2025arXiv
10
citations
Test-time Adaptation for Cross-modal Retrieval with Query Shift
ICLR 2025arXiv
9
citations
Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
ECCV 2024arXiv
9
citations
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
ECCV 2024arXiv
9
citations
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
ICLR 2025arXiv
9
citations
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
NeurIPS 2025arXiv
9
citations
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
NeurIPS 2025arXiv
9
citations
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
ECCV 2024arXiv
8
citations
Causally Motivated Sycophancy Mitigation for Large Language Models
ICLR 2025
8
citations
PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
ECCV 2024arXiv
8
citations
What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context
ICLR 2025arXiv
7
citations
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
ICLR 2025arXiv
7
citations
Learning Cross-hand Policies of High-DOF Reaching and Grasping
ECCV 2024arXiv
7
citations
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
ICLR 2025arXiv
7
citations
Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
ECCV 2024arXiv
6
citations
Occlusion-Aware Seamless Segmentation
ECCV 2024arXiv
6
citations
Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
ECCV 2024arXiv
6
citations
LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
ECCV 2024
6
citations
DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
ECCV 2024arXiv
6
citations
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
NeurIPS 2025arXiv
6
citations
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
ICLR 2025arXiv
6
citations
Integrative Decoding: Improving Factuality via Implicit Self-consistency
ICLR 2025arXiv
6
citations
ELICIT: LLM Augmentation Via External In-context Capability
ICLR 2025arXiv
6
citations
GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
ICLR 2025arXiv
6
citations
SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision
ICLR 2025arXiv
5
citations
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
NeurIPS 2025arXiv
5
citations
Hessian-Free Online Certified Unlearning
ICLR 2025arXiv
5
citations
Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
ECCV 2024arXiv
5
citations
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
ECCV 2024arXiv
5
citations
Learning Graph Invariance by Harnessing Spuriosity
ICLR 2025
5
citations
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
ICLR 2025arXiv
5
citations
Noisy Test-Time Adaptation in Vision-Language Models
ICLR 2025arXiv
4
citations
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
NeurIPS 2025arXiv
4
citations
Dynamic Risk Assessments for Offensive Cybersecurity Agents
NeurIPS 2025arXiv
4
citations
SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups
ICLR 2025arXiv
4
citations
Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers
ECCV 2024
4
citations
Estimation and Inference in Distributional Reinforcement Learning
NeurIPS 2025arXiv
4
citations
Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
ECCV 2024arXiv
3
citations
Attention! Your Vision Language Model Could Be Maliciously Manipulated
NeurIPS 2025arXiv
3
citations
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
NeurIPS 2025arXiv
3
citations
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
NeurIPS 2025arXiv
3
citations
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
NeurIPS 2025arXiv
3
citations
Homomorphism Expressivity of Spectral Invariant Graph Neural Networks
ICLR 2025arXiv
3
citations
STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
NeurIPS 2025arXiv
3
citations
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
NeurIPS 2025arXiv
3
citations
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
NeurIPS 2025arXiv
3
citations
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
NeurIPS 2025arXiv
3
citations
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
NeurIPS 2025arXiv
2
citations
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
ICLR 2025arXiv
2
citations
A Statistical Approach for Controlled Training Data Detection
ICLR 2025
2
citations
One Filters All: A Generalist Filter For State Estimation
NeurIPS 2025arXiv
2
citations
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
NeurIPS 2025arXiv
2
citations
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
NeurIPS 2025arXiv
2
citations
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
NeurIPS 2025arXiv
2
citations
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
ICLR 2025arXiv
2
citations
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
NeurIPS 2025arXiv
2
citations
OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
ECCV 2024arXiv
2
citations
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
NeurIPS 2025arXiv
2
citations
Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
NeurIPS 2025
2
citations
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
ICLR 2025arXiv
2
citations
Test-time Adaptation for Image Compression with Distribution Regularization
ICLR 2025arXiv
2
citations
A Conditional Independence Test in the Presence of Discretization
ICLR 2025arXiv
2
citations
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis
ICLR 2025arXiv
2
citations
Alignment of Large Language Models with Constrained Learning
NeurIPS 2025arXiv
2
citations
S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning
NeurIPS 2025arXiv
2
citations
PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
ICLR 2025arXiv
1
citations
MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations
ICLR 2025
1
citations
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
NeurIPS 2025arXiv
1
citations
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
ICLR 2025
1
citations
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
NeurIPS 2025arXiv
1
citations
Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning
NeurIPS 2025arXiv
1
citations
Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving
NeurIPS 2025arXiv
1
citations
Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation
NeurIPS 2025arXiv
1
citations
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
NeurIPS 2025arXiv
1
citations
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
NeurIPS 2025arXiv
1
citations
Dependency-aware Differentiable Neural Architecture Search
ECCV 2024
1
citations
Controlled LLM Decoding via Discrete Auto-regressive Biasing
ICLR 2025arXiv
1
citations
Two‑Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion
NeurIPS 2025arXiv
1
citations
Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
NeurIPS 2025arXiv
1
citations
Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining
NeurIPS 2025arXiv
1
citations
FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network
NeurIPS 2025arXiv
0
citations
Faithful Group Shapley Value
NeurIPS 2025arXiv
0
citations
Variational Task Vector Composition
NeurIPS 2025arXiv
0
citations
Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment
NeurIPS 2025arXiv
0
citations
Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning
NeurIPS 2025arXiv
0
citations
Stop DDoS Attacking the Research Community with AI-Generated Survey Papers
NeurIPS 2025arXiv
0
citations
Probing Neural Combinatorial Optimization Models
NeurIPS 2025arXiv
0
citations
ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation
NeurIPS 2025arXiv
0
citations
PID-controlled Langevin Dynamics for Faster Sampling on Generative Models
NeurIPS 2025arXiv
0
citations
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
NeurIPS 2025arXiv
0
citations
Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data
NeurIPS 2025arXiv
0
citations
Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos
NeurIPS 2025arXiv
0
citations
Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning
NeurIPS 2025arXiv
0
citations
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
NeurIPS 2025arXiv
0
citations
NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval
NeurIPS 2025arXiv
0
citations
EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis
NeurIPS 2025arXiv
0
citations
FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning
NeurIPS 2025arXiv
0
citations
DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches
NeurIPS 2025arXiv
0
citations
OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction
NeurIPS 2025arXiv
0
citations
Order-Level Attention Similarity Across Language Models: A Latent Commonality
NeurIPS 2025arXiv
0
citations
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
NeurIPS 2025arXiv
0
citations
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
ICLR 2025
0
citations
mmWalk: Towards Multi-modal Multi-view Walking Assistance
NeurIPS 2025arXiv
0
citations
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
ICLR 2025arXiv
0
citations
F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning
NeurIPS 2025arXiv
0
citations
MuSLR: Multimodal Symbolic Logical Reasoning
NeurIPS 2025arXiv
0
citations
AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation
NeurIPS 2025arXiv
0
citations