Yang

141
Papers
2,551
Total Citations

Papers (141)

MobileNetV4: Universal Models for the Mobile Ecosystem

ECCV 2024arXiv
407
citations

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

ICLR 2025arXiv
200
citations

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

ECCV 2024arXiv
183
citations

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

ICLR 2025arXiv
134
citations

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025arXiv
118
citations

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

ICLR 2025arXiv
101
citations

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

ICLR 2025arXiv
65
citations

CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching

ICLR 2025arXiv
59
citations

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

ECCV 2024arXiv
54
citations

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

ICLR 2025arXiv
52
citations

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

ICLR 2025arXiv
46
citations

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

ICLR 2025arXiv
43
citations

Trajectory attention for fine-grained video motion control

ICLR 2025arXiv
40
citations

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

NeurIPS 2025arXiv
36
citations

Pyramid Diffusion for Fine 3D Large Scene Generation

ECCV 2024arXiv
36
citations

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

NeurIPS 2025arXiv
34
citations

EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

NeurIPS 2025arXiv
31
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

NeurIPS 2025arXiv
29
citations

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

ICLR 2025arXiv
28
citations

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

NeurIPS 2025arXiv
27
citations

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

NeurIPS 2025arXiv
27
citations

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

ICLR 2025arXiv
26
citations

Multi-Agent Collaboration via Evolving Orchestration

NeurIPS 2025arXiv
25
citations

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

NeurIPS 2025arXiv
24
citations

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

ECCV 2024arXiv
23
citations

Language Imbalance Driven Rewarding for Multilingual Self-improving

ICLR 2025arXiv
23
citations

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

NeurIPS 2025arXiv
22
citations

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

ICLR 2025arXiv
21
citations

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

ECCV 2024arXiv
19
citations

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

ICLR 2025arXiv
18
citations

SELF-EVOLVED REWARD LEARNING FOR LLMS

ICLR 2025arXiv
18
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

NeurIPS 2025arXiv
18
citations

No Preference Left Behind: Group Distributional Preference Optimization

ICLR 2025arXiv
17
citations

Diffusion Model is a Good Pose Estimator from 3D RF-Vision

ECCV 2024arXiv
17
citations

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

ICLR 2025arXiv
17
citations

PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration

ECCV 2024arXiv
16
citations

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

NeurIPS 2025arXiv
16
citations

TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling

ECCV 2024arXiv
16
citations

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

ICLR 2025arXiv
16
citations

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

ECCV 2024arXiv
16
citations

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

ICLR 2025arXiv
16
citations

Spiking Vision Transformer with Saccadic Attention

ICLR 2025arXiv
15
citations

Quantized Spike-driven Transformer

ICLR 2025arXiv
14
citations

Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint

ICLR 2025arXiv
14
citations

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

ICLR 2025arXiv
14
citations

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

ICLR 2025arXiv
13
citations

MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation

ECCV 2024arXiv
12
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

ICLR 2025arXiv
12
citations

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

NeurIPS 2025arXiv
11
citations

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

NeurIPS 2025arXiv
10
citations

Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

ECCV 2024arXiv
10
citations

Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

ECCV 2024arXiv
10
citations

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

ICLR 2025arXiv
10
citations

CountFormer: Multi-View Crowd Counting Transformer

ECCV 2024arXiv
9
citations

Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective

NeurIPS 2025arXiv
9
citations

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

ICLR 2025arXiv
9
citations

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

ECCV 2024arXiv
9
citations

DataMan: Data Manager for Pre-training Large Language Models

ICLR 2025arXiv
8
citations

GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights

NeurIPS 2025arXiv
8
citations

Mitigating Memorization in Language Models

ICLR 2025arXiv
8
citations

PanTS: The Pancreatic Tumor Segmentation Dataset

NeurIPS 2025arXiv
8
citations

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

NeurIPS 2025arXiv
8
citations

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

ICLR 2025arXiv
8
citations

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

NeurIPS 2025arXiv
8
citations

Human Simulacra: Benchmarking the Personification of Large Language Models

ICLR 2025arXiv
8
citations

Learning Robust Spectral Dynamics for Temporal Domain Generalization

NeurIPS 2025arXiv
8
citations

PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling

ICLR 2025arXiv
7
citations

S4M: S4 for multivariate time series forecasting with Missing values

ICLR 2025arXiv
7
citations

Learning Chaos In A Linear Way

ICLR 2025arXiv
7
citations

OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents

ICLR 2025
6
citations

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

ICLR 2025arXiv
6
citations

Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

NeurIPS 2025arXiv
6
citations

Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation

ECCV 2024arXiv
5
citations

On Extending Direct Preference Optimization to Accommodate Ties

NeurIPS 2025arXiv
5
citations

Learning Spatial-Semantic Features for Robust Video Object Segmentation

ICLR 2025arXiv
5
citations

Neural Metamorphosis

ECCV 2024arXiv
5
citations

Self-Cooperation Knowledge Distillation for Novel Class Discovery

ECCV 2024arXiv
5
citations

Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion

ECCV 2024
4
citations

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

NeurIPS 2025arXiv
4
citations

FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design

NeurIPS 2025arXiv
4
citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

ICLR 2025arXiv
4
citations

MIRA: Medical Time Series Foundation Model for Real-World Health Data

NeurIPS 2025arXiv
4
citations

WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

NeurIPS 2025
4
citations

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

ECCV 2024arXiv
3
citations

Distilling Knowledge from Large-Scale Image Models for Object Detection

ECCV 2024
3
citations

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

NeurIPS 2025arXiv
3
citations

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

NeurIPS 2025arXiv
3
citations

Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation

NeurIPS 2025
3
citations

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

NeurIPS 2025arXiv
3
citations

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

NeurIPS 2025arXiv
3
citations

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

NeurIPS 2025arXiv
3
citations

CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling

NeurIPS 2025arXiv
3
citations

Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates

NeurIPS 2025arXiv
3
citations

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

NeurIPS 2025arXiv
3
citations

DUALFormer: Dual Graph Transformer

ICLR 2025
3
citations

Physics-aligned field reconstruction with diffusion bridge

ICLR 2025
3
citations

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

ICLR 2025arXiv
3
citations

Reading Recognition in the Wild

NeurIPS 2025arXiv
2
citations

See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction

NeurIPS 2025arXiv
2
citations

Image Editing As Programs with Diffusion Models

NeurIPS 2025arXiv
2
citations

Environment Inference for Learning Generalizable Dynamical System

NeurIPS 2025arXiv
2
citations

Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding

NeurIPS 2025arXiv
2
citations

Online Video Quality Enhancement with Spatial-Temporal Look-up Tables

ECCV 2024arXiv
2
citations

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

NeurIPS 2025arXiv
2
citations

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

ECCV 2024arXiv
2
citations

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

NeurIPS 2025arXiv
2
citations

ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding

ICLR 2025arXiv
2
citations

Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary

ICLR 2025
2
citations

EA3D: Online Open-World 3D Object Extraction from Streaming Videos

NeurIPS 2025arXiv
1
citations

Self-diffusion for Solving Inverse Problems

NeurIPS 2025arXiv
1
citations

Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning

NeurIPS 2025arXiv
1
citations

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning

NeurIPS 2025arXiv
1
citations

MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting

NeurIPS 2025arXiv
1
citations

KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge

NeurIPS 2025arXiv
1
citations

Blackbox Model Provenance via Palimpsestic Membership Inference

NeurIPS 2025arXiv
1
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

NeurIPS 2025arXiv
1
citations

Dependency-aware Differentiable Neural Architecture Search

ECCV 2024
1
citations

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

NeurIPS 2025arXiv
1
citations

R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection

ICLR 2025arXiv
1
citations

X-Field: A Physically Informed Representation for 3D X-ray Reconstruction

NeurIPS 2025
1
citations

$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

NeurIPS 2025
0
citations

Enhancing Training Data Attribution with Representational Optimization

NeurIPS 2025arXiv
0
citations

Information Retrieval Induced Safety Degradation in AI Agents

NeurIPS 2025arXiv
0
citations

Private Mechanism Design via Quantile Estimation

ICLR 2025
0
citations

Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees

ICLR 2025arXiv
0
citations

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

NeurIPS 2025
0
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

NeurIPS 2025arXiv
0
citations

Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary

NeurIPS 2025arXiv
0
citations

ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection

NeurIPS 2025arXiv
0
citations

Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

NeurIPS 2025arXiv
0
citations

Optimization Inspired Few-Shot Adaptation for Large Language Models

NeurIPS 2025arXiv
0
citations

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

NeurIPS 2025arXiv
0
citations

Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets

NeurIPS 2025arXiv
0
citations

FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models

NeurIPS 2025arXiv
0
citations

Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls

NeurIPS 2025arXiv
0
citations

Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection

NeurIPS 2025arXiv
0
citations

PathVQ: Reforming Computational Pathology Foundation Model for Whole Slide Image Analysis via Vector Quantization

NeurIPS 2025arXiv
0
citations

HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

NeurIPS 2025arXiv
0
citations

THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations

NeurIPS 2025arXiv
0
citations

World Models Should Prioritize the Unification of Physical and Social Dynamics

NeurIPS 2025arXiv
0
citations

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation

NeurIPS 2025arXiv
0
citations