li
74
Papers
1,457
Total Citations
Papers (74)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
ICLR 2025arXiv
351
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
ICLR 2025arXiv
116
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
ECCV 2024arXiv
114
citations
Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
NeurIPS 2025arXiv
56
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NeurIPS 2025arXiv
52
citations
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
ECCV 2024arXiv
40
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
ICLR 2025arXiv
39
citations
Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
ECCV 2024arXiv
30
citations
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
ICLR 2025arXiv
29
citations
EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
ECCV 2024arXiv
24
citations
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
ICLR 2025arXiv
23
citations
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
ICLR 2025arXiv
23
citations
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
ICLR 2025arXiv
23
citations
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
ICLR 2025arXiv
21
citations
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
ICLR 2025arXiv
20
citations
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
NeurIPS 2025
17
citations
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
CVPR 2024
17
citations
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
NeurIPS 2025arXiv
17
citations
Quantized Spike-driven Transformer
ICLR 2025arXiv
14
citations
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
ICLR 2025arXiv
13
citations
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
ECCV 2024arXiv
12
citations
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
ICLR 2025arXiv
11
citations
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
NeurIPS 2025arXiv
9
citations
Test-time Adaptation for Cross-modal Retrieval with Query Shift
ICLR 2025arXiv
9
citations
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
NeurIPS 2025arXiv
9
citations
Causally Motivated Sycophancy Mitigation for Large Language Models
ICLR 2025
8
citations
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
ICLR 2025arXiv
7
citations
SemReg: Semantics Constrained Point Cloud Registration
ECCV 2024
7
citations
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
NeurIPS 2025arXiv
7
citations
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
NeurIPS 2025arXiv
6
citations
LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
ECCV 2024
6
citations
CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
ECCV 2024
6
citations
Integrative Decoding: Improving Factuality via Implicit Self-consistency
ICLR 2025arXiv
6
citations
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
NeurIPS 2025arXiv
5
citations
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
NeurIPS 2025arXiv
5
citations
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
NeurIPS 2025arXiv
4
citations
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
NeurIPS 2025arXiv
4
citations
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
NeurIPS 2025arXiv
4
citations
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
NeurIPS 2025arXiv
3
citations
Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
ECCV 2024arXiv
3
citations
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
NeurIPS 2025arXiv
3
citations
Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
ECCV 2024
2
citations
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference
NeurIPS 2025arXiv
2
citations
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
ICLR 2025
2
citations
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
NeurIPS 2025arXiv
2
citations
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
ICLR 2025arXiv
2
citations
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
NeurIPS 2025arXiv
2
citations
Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection
ICLR 2025arXiv
2
citations
Matrix Product Sketching via Coordinated Sampling
ICLR 2025arXiv
2
citations
RoFt-Mol: Benchmarking Robust Fine-tuning with Molecular Graph Foundation Models
NeurIPS 2025arXiv
1
citations
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
NeurIPS 2025arXiv
1
citations
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
NeurIPS 2025arXiv
1
citations
Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems
NeurIPS 2025
1
citations
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
NeurIPS 2025arXiv
1
citations
TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels
NeurIPS 2025arXiv
1
citations
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
NeurIPS 2025arXiv
1
citations
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
ICLR 2025
1
citations
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
NeurIPS 2025arXiv
1
citations
ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos
NeurIPS 2025arXiv
0
citations
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
NeurIPS 2025arXiv
0
citations
Revealing Multimodal Causality with Large Language Models
NeurIPS 2025arXiv
0
citations
DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering
NeurIPS 2025arXiv
0
citations
Order-Level Attention Similarity Across Language Models: A Latent Commonality
NeurIPS 2025arXiv
0
citations
Don’t Forget the Enjoin: FocalLoRA for Instruction Hierarchical Alignment in Large Language Models
NeurIPS 2025
0
citations
Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls
NeurIPS 2025arXiv
0
citations
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
NeurIPS 2025arXiv
0
citations
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
NeurIPS 2025arXiv
0
citations
Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism
NeurIPS 2025arXiv
0
citations
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
ICLR 2025arXiv
0
citations
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
ICLR 2025arXiv
0
citations
NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval
NeurIPS 2025arXiv
0
citations
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
NeurIPS 2025arXiv
0
citations