li
134
Papers
3,395
Total Citations
Papers (134)
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
ICLR 2025arXiv
1,016
citations
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
ICLR 2025arXiv
351
citations
Evaluating Text-to-Visual Generation with Image-to-Text Generation
ECCV 2024arXiv
347
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
ICLR 2025arXiv
116
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
ECCV 2024arXiv
114
citations
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
ICLR 2025arXiv
97
citations
Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
NeurIPS 2025arXiv
56
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NeurIPS 2025arXiv
52
citations
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025arXiv
48
citations
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
ICLR 2025arXiv
41
citations
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
ECCV 2024arXiv
40
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
ICLR 2025arXiv
39
citations
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
NeurIPS 2025arXiv
31
citations
Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
ECCV 2024arXiv
30
citations
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
ICLR 2025arXiv
29
citations
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
NeurIPS 2025arXiv
28
citations
EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
ECCV 2024arXiv
24
citations
What Makes a Good Diffusion Planner for Decision Making?
ICLR 2025arXiv
24
citations
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
ICLR 2025arXiv
23
citations
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
ICLR 2025arXiv
23
citations
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
ICLR 2025arXiv
23
citations
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
ICLR 2025arXiv
21
citations
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
ICLR 2025arXiv
20
citations
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
ECCV 2024
19
citations
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
NeurIPS 2025arXiv
18
citations
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
ICLR 2025arXiv
17
citations
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
NeurIPS 2025arXiv
17
citations
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
NeurIPS 2025arXiv
17
citations
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
CVPR 2024
17
citations
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
NeurIPS 2025
17
citations
TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
ECCV 2024arXiv
16
citations
Quantized Spike-driven Transformer
ICLR 2025arXiv
14
citations
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
ECCV 2024arXiv
14
citations
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
ICLR 2025arXiv
13
citations
On a Connection Between Imitation Learning and RLHF
ICLR 2025arXiv
13
citations
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
ECCV 2024arXiv
12
citations
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
ICLR 2025arXiv
11
citations
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
ICLR 2025
11
citations
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
ICLR 2025arXiv
11
citations
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
ECCV 2024arXiv
10
citations
Motion and Structure from Event-based Normal Flow
ECCV 2024arXiv
10
citations
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
ECCV 2024
10
citations
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
NeurIPS 2025arXiv
9
citations
Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
ECCV 2024arXiv
9
citations
Test-time Adaptation for Cross-modal Retrieval with Query Shift
ICLR 2025arXiv
9
citations
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
ECCV 2024arXiv
9
citations
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
NeurIPS 2025arXiv
9
citations
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
ECCV 2024arXiv
8
citations
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
ICLR 2025arXiv
8
citations
Causally Motivated Sycophancy Mitigation for Large Language Models
ICLR 2025
8
citations
PanTS: The Pancreatic Tumor Segmentation Dataset
NeurIPS 2025arXiv
8
citations
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
ICLR 2025arXiv
7
citations
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
NeurIPS 2025arXiv
7
citations
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
ICLR 2025arXiv
7
citations
Attributing Culture-Conditioned Generations to Pretraining Corpora
ICLR 2025arXiv
7
citations
SemReg: Semantics Constrained Point Cloud Registration
ECCV 2024
7
citations
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning
NeurIPS 2025arXiv
7
citations
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
NeurIPS 2025arXiv
6
citations
CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
ECCV 2024
6
citations
LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
ECCV 2024
6
citations
Zebra-Llama: Towards Extremely Efficient Hybrid Models
NeurIPS 2025arXiv
6
citations
Integrative Decoding: Improving Factuality via Implicit Self-consistency
ICLR 2025arXiv
6
citations
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
NeurIPS 2025arXiv
6
citations
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models
NeurIPS 2025arXiv
6
citations
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
NeurIPS 2025arXiv
5
citations
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
NeurIPS 2025arXiv
5
citations
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
ICLR 2025arXiv
5
citations
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
NeurIPS 2025arXiv
5
citations
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
ECCV 2024arXiv
5
citations
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
NeurIPS 2025arXiv
5
citations
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
NeurIPS 2025arXiv
5
citations
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
NeurIPS 2025arXiv
4
citations
Exploring Diffusion Transformer Designs via Grafting
NeurIPS 2025arXiv
4
citations
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
NeurIPS 2025arXiv
4
citations
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
ICLR 2025arXiv
4
citations
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
NeurIPS 2025arXiv
4
citations
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
NeurIPS 2025arXiv
4
citations
On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations
ICLR 2025arXiv
3
citations
SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning
NeurIPS 2025arXiv
3
citations
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
NeurIPS 2025arXiv
3
citations
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
NeurIPS 2025arXiv
3
citations
Distilling Knowledge from Large-Scale Image Models for Object Detection
ECCV 2024
3
citations
Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
ECCV 2024arXiv
3
citations
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
NeurIPS 2025arXiv
3
citations
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
ICLR 2025
2
citations
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference
NeurIPS 2025arXiv
2
citations
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
ECCV 2024arXiv
2
citations
Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
ECCV 2024
2
citations
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
NeurIPS 2025arXiv
2
citations
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
NeurIPS 2025arXiv
2
citations
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
NeurIPS 2025arXiv
2
citations
Matrix Product Sketching via Coordinated Sampling
ICLR 2025arXiv
2
citations
PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs
NeurIPS 2025arXiv
2
citations
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
NeurIPS 2025arXiv
2
citations
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
ICLR 2025arXiv
2
citations
Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection
ICLR 2025arXiv
2
citations
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
NeurIPS 2025arXiv
1
citations
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
NeurIPS 2025arXiv
1
citations
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
ICLR 2025
1
citations
TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels
NeurIPS 2025arXiv
1
citations
SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction
NeurIPS 2025arXiv
1
citations
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
NeurIPS 2025arXiv
1
citations
RoFt-Mol: Benchmarking Robust Fine-tuning with Molecular Graph Foundation Models
NeurIPS 2025arXiv
1
citations
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
NeurIPS 2025arXiv
1
citations
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
NeurIPS 2025arXiv
1
citations
Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems
NeurIPS 2025
1
citations
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
NeurIPS 2025arXiv
1
citations
DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering
NeurIPS 2025arXiv
0
citations
Physically Plausible Color Correction for Neural Radiance Fields
ECCV 2024
0
citations
Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
ECCV 2024arXiv
0
citations
COIN-Matting: Confounder Intervention for Image Matting
ECCV 2024
0
citations
Toward a Unified Geometry Understanding : Riemannian Diffusion Framework for Graph Generation and Prediction
NeurIPS 2025arXiv
0
citations
Revealing Multimodal Causality with Large Language Models
NeurIPS 2025arXiv
0
citations
Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism
NeurIPS 2025arXiv
0
citations
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
NeurIPS 2025arXiv
0
citations
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
NeurIPS 2025arXiv
0
citations
Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls
NeurIPS 2025arXiv
0
citations
Order-Level Attention Similarity Across Language Models: A Latent Commonality
NeurIPS 2025arXiv
0
citations
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
NeurIPS 2025arXiv
0
citations
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
NeurIPS 2025arXiv
0
citations
The Primacy of Magnitude in Low-Rank Adaptation
NeurIPS 2025arXiv
0
citations
NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval
NeurIPS 2025arXiv
0
citations
Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary
NeurIPS 2025arXiv
0
citations
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
NeurIPS 2025arXiv
0
citations
Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits
NeurIPS 2025arXiv
0
citations
Real-World Reinforcement Learning of Active Perception Behaviors
NeurIPS 2025arXiv
0
citations
ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos
NeurIPS 2025arXiv
0
citations
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
NeurIPS 2025arXiv
0
citations
Purest Quantum State Identification
NeurIPS 2025arXiv
0
citations
Don’t Forget the Enjoin: FocalLoRA for Instruction Hierarchical Alignment in Large Language Models
NeurIPS 2025
0
citations
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
ICLR 2025arXiv
0
citations
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
ICLR 2025arXiv
0
citations