Lin

49
Papers
2,051
Total Citations

Papers (49)

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

ICLR 2025arXiv
351
citations

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

ECCV 2024arXiv
343
citations

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

ECCV 2024arXiv
223
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

Data Scaling Laws in Imitation Learning for Robotic Manipulation

ICLR 2025arXiv
115
citations

Tamper-Resistant Safeguards for Open-Weight LLMs

ICLR 2025arXiv
108
citations

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

ICLR 2025arXiv
101
citations

RegMix: Data Mixture as Regression for Language Model Pre-training

ICLR 2025arXiv
99
citations

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

NeurIPS 2025arXiv
96
citations

ImgEdit: A Unified Image Editing Dataset and Benchmark

NeurIPS 2025arXiv
84
citations

MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS

ICLR 2025arXiv
78
citations

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

ICLR 2025arXiv
40
citations

Theory on Mixture-of-Experts in Continual Learning

ICLR 2025arXiv
40
citations

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

ICLR 2025arXiv
33
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

NeurIPS 2025arXiv
29
citations

Fast Feedforward 3D Gaussian Splatting Compression

ICLR 2025arXiv
26
citations

Text-to-Image Rectified Flow as Plug-and-Play Priors

ICLR 2025arXiv
23
citations

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

ECCV 2024arXiv
19
citations

Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment

ECCV 2024
14
citations

Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection

ECCV 2024arXiv
12
citations

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

ECCV 2024arXiv
11
citations

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

NeurIPS 2025arXiv
9
citations

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

ICLR 2025arXiv
8
citations

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

NeurIPS 2025arXiv
8
citations

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

ICLR 2025arXiv
8
citations

VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

NeurIPS 2025arXiv
8
citations

DataMan: Data Manager for Pre-training Large Language Models

ICLR 2025arXiv
8
citations

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

ICLR 2025arXiv
7
citations

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

NeurIPS 2025arXiv
5
citations

Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation

ICLR 2025arXiv
4
citations

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

NeurIPS 2025arXiv
4
citations

SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process

ICLR 2025
4
citations

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

NeurIPS 2025arXiv
2
citations

Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

NeurIPS 2025
2
citations

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation

ICLR 2025arXiv
2
citations

Teaching Language Models to Reason with Tools

NeurIPS 2025arXiv
2
citations

Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling

NeurIPS 2025arXiv
2
citations

Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

NeurIPS 2025arXiv
2
citations

RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills

NeurIPS 2025arXiv
2
citations

Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining

NeurIPS 2025arXiv
1
citations

Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

NeurIPS 2025arXiv
0
citations

Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data

NeurIPS 2025arXiv
0
citations

PlanU: Large Language Model Reasoning through Planning under Uncertainty

NeurIPS 2025arXiv
0
citations

TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model

NeurIPS 2025arXiv
0
citations

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

NeurIPS 2025arXiv
0
citations

Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos

ICLR 2025arXiv
0
citations

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

NeurIPS 2025arXiv
0
citations

Sampled Estimators For Softmax Must Be Biased

NeurIPS 2025
0
citations

Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study

NeurIPS 2025arXiv
0
citations