Lin
49
Papers
2,051
Total Citations
Papers (49)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
ICLR 2025arXiv
351
citations
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
ECCV 2024arXiv
343
citations
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
ECCV 2024arXiv
223
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
Data Scaling Laws in Imitation Learning for Robotic Manipulation
ICLR 2025arXiv
115
citations
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025arXiv
108
citations
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
ICLR 2025arXiv
101
citations
RegMix: Data Mixture as Regression for Language Model Pre-training
ICLR 2025arXiv
99
citations
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
NeurIPS 2025arXiv
96
citations
ImgEdit: A Unified Image Editing Dataset and Benchmark
NeurIPS 2025arXiv
84
citations
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
ICLR 2025arXiv
78
citations
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
ICLR 2025arXiv
40
citations
Theory on Mixture-of-Experts in Continual Learning
ICLR 2025arXiv
40
citations
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
ICLR 2025arXiv
33
citations
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
NeurIPS 2025arXiv
29
citations
Fast Feedforward 3D Gaussian Splatting Compression
ICLR 2025arXiv
26
citations
Text-to-Image Rectified Flow as Plug-and-Play Priors
ICLR 2025arXiv
23
citations
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
ECCV 2024arXiv
19
citations
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
ECCV 2024
14
citations
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
ECCV 2024arXiv
12
citations
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
ECCV 2024arXiv
11
citations
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
NeurIPS 2025arXiv
9
citations
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
ICLR 2025arXiv
8
citations
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
NeurIPS 2025arXiv
8
citations
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector
ICLR 2025arXiv
8
citations
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
NeurIPS 2025arXiv
8
citations
DataMan: Data Manager for Pre-training Large Language Models
ICLR 2025arXiv
8
citations
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
ICLR 2025arXiv
7
citations
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
NeurIPS 2025arXiv
5
citations
Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation
ICLR 2025arXiv
4
citations
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
NeurIPS 2025arXiv
4
citations
SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process
ICLR 2025
4
citations
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
NeurIPS 2025arXiv
2
citations
Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
NeurIPS 2025
2
citations
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
ICLR 2025arXiv
2
citations
Teaching Language Models to Reason with Tools
NeurIPS 2025arXiv
2
citations
Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling
NeurIPS 2025arXiv
2
citations
Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation
NeurIPS 2025arXiv
2
citations
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
NeurIPS 2025arXiv
2
citations
Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining
NeurIPS 2025arXiv
1
citations
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
NeurIPS 2025arXiv
0
citations
Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data
NeurIPS 2025arXiv
0
citations
PlanU: Large Language Model Reasoning through Planning under Uncertainty
NeurIPS 2025arXiv
0
citations
TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model
NeurIPS 2025arXiv
0
citations
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
NeurIPS 2025arXiv
0
citations
Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
ICLR 2025arXiv
0
citations
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
NeurIPS 2025arXiv
0
citations
Sampled Estimators For Softmax Must Be Biased
NeurIPS 2025
0
citations
Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study
NeurIPS 2025arXiv
0
citations