li

134
Papers
3,395
Total Citations

Papers (134)

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

ICLR 2025arXiv
1,016
citations

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

ICLR 2025arXiv
351
citations

Evaluating Text-to-Visual Generation with Image-to-Text Generation

ECCV 2024arXiv
347
citations

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

ICLR 2025arXiv
141
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

ICLR 2025arXiv
116
citations

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

ECCV 2024arXiv
114
citations

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

ICLR 2025arXiv
97
citations

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

NeurIPS 2025arXiv
56
citations

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

NeurIPS 2025arXiv
52
citations

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

ICLR 2025arXiv
48
citations

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

ICLR 2025arXiv
41
citations

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

ECCV 2024arXiv
40
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

ICLR 2025arXiv
39
citations

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

NeurIPS 2025arXiv
31
citations

Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection

ECCV 2024arXiv
30
citations

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

ICLR 2025arXiv
29
citations

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

NeurIPS 2025arXiv
28
citations

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

ECCV 2024arXiv
24
citations

What Makes a Good Diffusion Planner for Decision Making?

ICLR 2025arXiv
24
citations

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

ICLR 2025arXiv
23
citations

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

ICLR 2025arXiv
23
citations

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

ICLR 2025arXiv
23
citations

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

ICLR 2025arXiv
21
citations

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

ICLR 2025arXiv
20
citations

Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding

ECCV 2024
19
citations

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

NeurIPS 2025arXiv
18
citations

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

ICLR 2025arXiv
17
citations

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

NeurIPS 2025arXiv
17
citations

GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments

NeurIPS 2025arXiv
17
citations

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

CVPR 2024
17
citations

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

NeurIPS 2025
17
citations

TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling

ECCV 2024arXiv
16
citations

Quantized Spike-driven Transformer

ICLR 2025arXiv
14
citations

NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

ECCV 2024arXiv
14
citations

CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

ICLR 2025arXiv
13
citations

On a Connection Between Imitation Learning and RLHF

ICLR 2025arXiv
13
citations

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

ECCV 2024arXiv
12
citations

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation

ICLR 2025arXiv
11
citations

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

ICLR 2025
11
citations

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

ICLR 2025arXiv
11
citations

Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

ECCV 2024arXiv
10
citations

Motion and Structure from Event-based Normal Flow

ECCV 2024arXiv
10
citations

KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval

ECCV 2024
10
citations

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

NeurIPS 2025arXiv
9
citations

Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

ECCV 2024arXiv
9
citations

Test-time Adaptation for Cross-modal Retrieval with Query Shift

ICLR 2025arXiv
9
citations

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

ECCV 2024arXiv
9
citations

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

NeurIPS 2025arXiv
9
citations

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

ECCV 2024arXiv
8
citations

SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios

ICLR 2025arXiv
8
citations

Causally Motivated Sycophancy Mitigation for Large Language Models

ICLR 2025
8
citations

PanTS: The Pancreatic Tumor Segmentation Dataset

NeurIPS 2025arXiv
8
citations

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

ICLR 2025arXiv
7
citations

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

NeurIPS 2025arXiv
7
citations

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

ICLR 2025arXiv
7
citations

Attributing Culture-Conditioned Generations to Pretraining Corpora

ICLR 2025arXiv
7
citations

SemReg: Semantics Constrained Point Cloud Registration

ECCV 2024
7
citations

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

NeurIPS 2025arXiv
7
citations

CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension

NeurIPS 2025arXiv
6
citations

CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection

ECCV 2024
6
citations

LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang

ECCV 2024
6
citations

Zebra-Llama: Towards Extremely Efficient Hybrid Models

NeurIPS 2025arXiv
6
citations

Integrative Decoding: Improving Factuality via Implicit Self-consistency

ICLR 2025arXiv
6
citations

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

NeurIPS 2025arXiv
6
citations

BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models

NeurIPS 2025arXiv
6
citations

Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology

NeurIPS 2025arXiv
5
citations

The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition

NeurIPS 2025arXiv
5
citations

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

ICLR 2025arXiv
5
citations

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

NeurIPS 2025arXiv
5
citations

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

ECCV 2024arXiv
5
citations

Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search

NeurIPS 2025arXiv
5
citations

Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning

NeurIPS 2025arXiv
5
citations

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

NeurIPS 2025arXiv
4
citations

Exploring Diffusion Transformer Designs via Grafting

NeurIPS 2025arXiv
4
citations

Characterizing the Expressivity of Fixed-Precision Transformer Language Models

NeurIPS 2025arXiv
4
citations

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

ICLR 2025arXiv
4
citations

Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization

NeurIPS 2025arXiv
4
citations

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

NeurIPS 2025arXiv
4
citations

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

ICLR 2025arXiv
3
citations

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

NeurIPS 2025arXiv
3
citations

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

NeurIPS 2025arXiv
3
citations

GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning

NeurIPS 2025arXiv
3
citations

Distilling Knowledge from Large-Scale Image Models for Object Detection

ECCV 2024
3
citations

Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network

ECCV 2024arXiv
3
citations

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

NeurIPS 2025arXiv
3
citations

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

ICLR 2025
2
citations

MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference

NeurIPS 2025arXiv
2
citations

Online Video Quality Enhancement with Spatial-Temporal Look-up Tables

ECCV 2024arXiv
2
citations

Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation

ECCV 2024
2
citations

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

NeurIPS 2025arXiv
2
citations

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

NeurIPS 2025arXiv
2
citations

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

NeurIPS 2025arXiv
2
citations

Matrix Product Sketching via Coordinated Sampling

ICLR 2025arXiv
2
citations

PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

NeurIPS 2025arXiv
2
citations

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

NeurIPS 2025arXiv
2
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

ICLR 2025arXiv
2
citations

Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

ICLR 2025arXiv
2
citations

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

NeurIPS 2025arXiv
1
citations

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing

NeurIPS 2025arXiv
1
citations

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

ICLR 2025
1
citations

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

NeurIPS 2025arXiv
1
citations

SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction

NeurIPS 2025arXiv
1
citations

Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling

NeurIPS 2025arXiv
1
citations

RoFt-Mol: Benchmarking Robust Fine-tuning with Molecular Graph Foundation Models

NeurIPS 2025arXiv
1
citations

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

NeurIPS 2025arXiv
1
citations

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree

NeurIPS 2025arXiv
1
citations

Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems

NeurIPS 2025
1
citations

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

NeurIPS 2025arXiv
1
citations

DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

NeurIPS 2025arXiv
0
citations

Physically Plausible Color Correction for Neural Radiance Fields

ECCV 2024
0
citations

Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring

ECCV 2024arXiv
0
citations

COIN-Matting: Confounder Intervention for Image Matting

ECCV 2024
0
citations

Toward a Unified Geometry Understanding : Riemannian Diffusion Framework for Graph Generation and Prediction

NeurIPS 2025arXiv
0
citations

Revealing Multimodal Causality with Large Language Models

NeurIPS 2025arXiv
0
citations

Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism

NeurIPS 2025arXiv
0
citations

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

NeurIPS 2025arXiv
0
citations

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

NeurIPS 2025arXiv
0
citations

Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls

NeurIPS 2025arXiv
0
citations

Order-Level Attention Similarity Across Language Models: A Latent Commonality

NeurIPS 2025arXiv
0
citations

Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations

NeurIPS 2025arXiv
0
citations

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

NeurIPS 2025arXiv
0
citations

The Primacy of Magnitude in Low-Rank Adaptation

NeurIPS 2025arXiv
0
citations

NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval

NeurIPS 2025arXiv
0
citations

Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary

NeurIPS 2025arXiv
0
citations

Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models

NeurIPS 2025arXiv
0
citations

Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits

NeurIPS 2025arXiv
0
citations

Real-World Reinforcement Learning of Active Perception Behaviors

NeurIPS 2025arXiv
0
citations

ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos

NeurIPS 2025arXiv
0
citations

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios

NeurIPS 2025arXiv
0
citations

Purest Quantum State Identification

NeurIPS 2025arXiv
0
citations

Don’t Forget the Enjoin: FocalLoRA for Instruction Hierarchical Alignment in Large Language Models

NeurIPS 2025
0
citations

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

ICLR 2025arXiv
0
citations

Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees

ICLR 2025arXiv
0
citations