Zhao

43
Papers
1,097
Total Citations

Papers (43)

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

ECCV 2024arXiv
343
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

ECCV 2024arXiv
82
citations

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

ICLR 2025arXiv
61
citations

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

ICLR 2025arXiv
53
citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

ECCV 2024arXiv
44
citations

Dynamic Diffusion Transformer

ICLR 2025arXiv
34
citations

Informed Correctors for Discrete Diffusion Models

NeurIPS 2025arXiv
31
citations

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

ICLR 2025arXiv
30
citations

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation

NeurIPS 2025arXiv
25
citations

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

ECCV 2024arXiv
23
citations

Region-Adaptive Transform with Segmentation Prior for Image Compression

ECCV 2024arXiv
21
citations

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

ECCV 2024arXiv
19
citations

Commit0: Library Generation from Scratch

ICLR 2025arXiv
18
citations

InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping

ECCV 2024arXiv
18
citations

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

ECCV 2024arXiv
18
citations

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

NeurIPS 2025arXiv
18
citations

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

ICLR 2025arXiv
16
citations

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

NeurIPS 2025arXiv
16
citations

CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

ICLR 2025arXiv
13
citations

Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

NeurIPS 2025arXiv
11
citations

CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems

ECCV 2024arXiv
10
citations

CLEVER: A Curated Benchmark for Formally Verified Code Generation

NeurIPS 2025arXiv
10
citations

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

ECCV 2024arXiv
9
citations

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

ICLR 2025arXiv
9
citations

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

NeurIPS 2025arXiv
7
citations

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

ICLR 2025arXiv
7
citations

T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks

NeurIPS 2025arXiv
6
citations

Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models

ICLR 2025arXiv
5
citations

The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition

NeurIPS 2025arXiv
5
citations

Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers

ECCV 2024
4
citations

Towards foundational LiDAR world models with efficient latent flow matching

NeurIPS 2025arXiv
4
citations

TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration

NeurIPS 2025arXiv
3
citations

TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine

NeurIPS 2025arXiv
2
citations

PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling

NeurIPS 2025arXiv
1
citations

Capability Localization: Capabilities Can be Localized rather than Individual Knowledge

ICLR 2025arXiv
1
citations

PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph

ICLR 2025arXiv
1
citations

Learning the Plasticity: Plasticity-Driven Learning Framework in Spiking Neural Networks

NeurIPS 2025arXiv
1
citations

Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization

ECCV 2024
0
citations

Simulating Society Requires Simulating Thought

NeurIPS 2025arXiv
0
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

NeurIPS 2025arXiv
0
citations

Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study

NeurIPS 2025arXiv
0
citations

Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Observation Delays

NeurIPS 2025
0
citations