Yang
141
Papers
2,551
Total Citations
Papers (141)
MobileNetV4: Universal Models for the Mobile Ecosystem
ECCV 2024arXiv
407
citations
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
ICLR 2025arXiv
200
citations
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
ECCV 2024arXiv
183
citations
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
ICLR 2025arXiv
134
citations
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
NeurIPS 2025arXiv
118
citations
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
ICLR 2025arXiv
101
citations
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
ICLR 2025arXiv
65
citations
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
ICLR 2025arXiv
59
citations
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
ECCV 2024arXiv
54
citations
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
ICLR 2025arXiv
52
citations
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
ICLR 2025arXiv
46
citations
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025arXiv
43
citations
Trajectory attention for fine-grained video motion control
ICLR 2025arXiv
40
citations
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations
NeurIPS 2025arXiv
36
citations
Pyramid Diffusion for Fine 3D Large Scene Generation
ECCV 2024arXiv
36
citations
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
NeurIPS 2025arXiv
34
citations
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
NeurIPS 2025arXiv
31
citations
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
NeurIPS 2025arXiv
29
citations
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
ICLR 2025arXiv
28
citations
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning
NeurIPS 2025arXiv
27
citations
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
NeurIPS 2025arXiv
27
citations
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
ICLR 2025arXiv
26
citations
Multi-Agent Collaboration via Evolving Orchestration
NeurIPS 2025arXiv
25
citations
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
NeurIPS 2025arXiv
24
citations
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
ECCV 2024arXiv
23
citations
Language Imbalance Driven Rewarding for Multilingual Self-improving
ICLR 2025arXiv
23
citations
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
NeurIPS 2025arXiv
22
citations
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
ICLR 2025arXiv
21
citations
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
ECCV 2024arXiv
19
citations
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
ICLR 2025arXiv
18
citations
SELF-EVOLVED REWARD LEARNING FOR LLMS
ICLR 2025arXiv
18
citations
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
NeurIPS 2025arXiv
18
citations
No Preference Left Behind: Group Distributional Preference Optimization
ICLR 2025arXiv
17
citations
Diffusion Model is a Good Pose Estimator from 3D RF-Vision
ECCV 2024arXiv
17
citations
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
ICLR 2025arXiv
17
citations
PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
ECCV 2024arXiv
16
citations
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
NeurIPS 2025arXiv
16
citations
TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
ECCV 2024arXiv
16
citations
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
ICLR 2025arXiv
16
citations
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
ECCV 2024arXiv
16
citations
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
ICLR 2025arXiv
16
citations
Spiking Vision Transformer with Saccadic Attention
ICLR 2025arXiv
15
citations
Quantized Spike-driven Transformer
ICLR 2025arXiv
14
citations
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
ICLR 2025arXiv
14
citations
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
ICLR 2025arXiv
14
citations
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
ICLR 2025arXiv
13
citations
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
ECCV 2024arXiv
12
citations
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
ICLR 2025arXiv
12
citations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
NeurIPS 2025arXiv
11
citations
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning
NeurIPS 2025arXiv
10
citations
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
ECCV 2024arXiv
10
citations
Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
ECCV 2024arXiv
10
citations
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
ICLR 2025arXiv
10
citations
CountFormer: Multi-View Crowd Counting Transformer
ECCV 2024arXiv
9
citations
Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective
NeurIPS 2025arXiv
9
citations
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025arXiv
9
citations
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
ECCV 2024arXiv
9
citations
DataMan: Data Manager for Pre-training Large Language Models
ICLR 2025arXiv
8
citations
GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights
NeurIPS 2025arXiv
8
citations
Mitigating Memorization in Language Models
ICLR 2025arXiv
8
citations
PanTS: The Pancreatic Tumor Segmentation Dataset
NeurIPS 2025arXiv
8
citations
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
NeurIPS 2025arXiv
8
citations
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
ICLR 2025arXiv
8
citations
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
NeurIPS 2025arXiv
8
citations
Human Simulacra: Benchmarking the Personification of Large Language Models
ICLR 2025arXiv
8
citations
Learning Robust Spectral Dynamics for Temporal Domain Generalization
NeurIPS 2025arXiv
8
citations
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
ICLR 2025arXiv
7
citations
S4M: S4 for multivariate time series forecasting with Missing values
ICLR 2025arXiv
7
citations
Learning Chaos In A Linear Way
ICLR 2025arXiv
7
citations
OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents
ICLR 2025
6
citations
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
ICLR 2025arXiv
6
citations
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
NeurIPS 2025arXiv
6
citations
Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
ECCV 2024arXiv
5
citations
On Extending Direct Preference Optimization to Accommodate Ties
NeurIPS 2025arXiv
5
citations
Learning Spatial-Semantic Features for Robust Video Object Segmentation
ICLR 2025arXiv
5
citations
Neural Metamorphosis
ECCV 2024arXiv
5
citations
Self-Cooperation Knowledge Distillation for Novel Class Discovery
ECCV 2024arXiv
5
citations
Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion
ECCV 2024
4
citations
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
NeurIPS 2025arXiv
4
citations
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
NeurIPS 2025arXiv
4
citations
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
ICLR 2025arXiv
4
citations
MIRA: Medical Time Series Foundation Model for Real-World Health Data
NeurIPS 2025arXiv
4
citations
WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
NeurIPS 2025
4
citations
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
ECCV 2024arXiv
3
citations
Distilling Knowledge from Large-Scale Image Models for Object Detection
ECCV 2024
3
citations
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
NeurIPS 2025arXiv
3
citations
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
NeurIPS 2025arXiv
3
citations
Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation
NeurIPS 2025
3
citations
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
NeurIPS 2025arXiv
3
citations
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
NeurIPS 2025arXiv
3
citations
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
NeurIPS 2025arXiv
3
citations
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
NeurIPS 2025arXiv
3
citations
Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates
NeurIPS 2025arXiv
3
citations
TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
NeurIPS 2025arXiv
3
citations
DUALFormer: Dual Graph Transformer
ICLR 2025
3
citations
Physics-aligned field reconstruction with diffusion bridge
ICLR 2025
3
citations
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
ICLR 2025arXiv
3
citations
Reading Recognition in the Wild
NeurIPS 2025arXiv
2
citations
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
NeurIPS 2025arXiv
2
citations
Image Editing As Programs with Diffusion Models
NeurIPS 2025arXiv
2
citations
Environment Inference for Learning Generalizable Dynamical System
NeurIPS 2025arXiv
2
citations
Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
NeurIPS 2025arXiv
2
citations
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
ECCV 2024arXiv
2
citations
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
NeurIPS 2025arXiv
2
citations
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
ECCV 2024arXiv
2
citations
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
NeurIPS 2025arXiv
2
citations
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
ICLR 2025arXiv
2
citations
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
ICLR 2025
2
citations
EA3D: Online Open-World 3D Object Extraction from Streaming Videos
NeurIPS 2025arXiv
1
citations
Self-diffusion for Solving Inverse Problems
NeurIPS 2025arXiv
1
citations
Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning
NeurIPS 2025arXiv
1
citations
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
NeurIPS 2025arXiv
1
citations
MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting
NeurIPS 2025arXiv
1
citations
KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
NeurIPS 2025arXiv
1
citations
Blackbox Model Provenance via Palimpsestic Membership Inference
NeurIPS 2025arXiv
1
citations
Risk-aware Direct Preference Optimization under Nested Risk Measure
NeurIPS 2025arXiv
1
citations
Dependency-aware Differentiable Neural Architecture Search
ECCV 2024
1
citations
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
NeurIPS 2025arXiv
1
citations
R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection
ICLR 2025arXiv
1
citations
X-Field: A Physically Informed Representation for 3D X-ray Reconstruction
NeurIPS 2025
1
citations
$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization
NeurIPS 2025
0
citations
Enhancing Training Data Attribution with Representational Optimization
NeurIPS 2025arXiv
0
citations
Information Retrieval Induced Safety Degradation in AI Agents
NeurIPS 2025arXiv
0
citations
Private Mechanism Design via Quantile Estimation
ICLR 2025
0
citations
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
ICLR 2025arXiv
0
citations
Deployment Efficient Reward-Free Exploration with Linear Function Approximation
NeurIPS 2025
0
citations
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
NeurIPS 2025arXiv
0
citations
Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary
NeurIPS 2025arXiv
0
citations
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
NeurIPS 2025arXiv
0
citations
Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets
NeurIPS 2025arXiv
0
citations
Optimization Inspired Few-Shot Adaptation for Large Language Models
NeurIPS 2025arXiv
0
citations
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
NeurIPS 2025arXiv
0
citations
Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
NeurIPS 2025arXiv
0
citations
FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models
NeurIPS 2025arXiv
0
citations
Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls
NeurIPS 2025arXiv
0
citations
Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection
NeurIPS 2025arXiv
0
citations
PathVQ: Reforming Computational Pathology Foundation Model for Whole Slide Image Analysis via Vector Quantization
NeurIPS 2025arXiv
0
citations
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning
NeurIPS 2025arXiv
0
citations
THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations
NeurIPS 2025arXiv
0
citations
World Models Should Prioritize the Unification of Physical and Social Dynamics
NeurIPS 2025arXiv
0
citations
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
NeurIPS 2025arXiv
0
citations