Luo
49
Papers
160
Total Citations
Papers (49)
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
CVPR 2025arXiv
68
citations
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
ICLR 2025arXiv
33
citations
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
ECCV 2024arXiv
10
citations
Uncertainty-aware sign language video retrieval with probability distribution modeling
ECCV 2024arXiv
10
citations
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
ICLR 2025arXiv
7
citations
Latent Chain-of-Thought for Visual Reasoning
NeurIPS 2025arXiv
7
citations
Simultaneous Swap Regret Minimization via KL-Calibration
NeurIPS 2025arXiv
6
citations
Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
ECCV 2024arXiv
6
citations
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025arXiv
4
citations
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
NeurIPS 2025arXiv
4
citations
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
CVPR 2025arXiv
3
citations
Attention! Your Vision Language Model Could Be Maliciously Manipulated
NeurIPS 2025arXiv
2
citations
DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering
NeurIPS 2025arXiv
0
citations
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
ICLR 2025arXiv
0
citations
SysBench: Can LLMs Follow System Message?
ICLR 2025
0
citations
Real-World Reinforcement Learning of Active Perception Behaviors
NeurIPS 2025arXiv
0
citations
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
NeurIPS 2025arXiv
0
citations
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
ECCV 2024arXiv
0
citations
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025arXiv
0
citations
Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition
ICLR 2025arXiv
0
citations
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
ICLR 2025arXiv
0
citations
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
ICLR 2025arXiv
0
citations
Self-diffusion for Solving Inverse Problems
NeurIPS 2025arXiv
0
citations
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
ICLR 2025arXiv
0
citations
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders
NeurIPS 2025arXiv
0
citations
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
ECCV 2024arXiv
0
citations
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
NeurIPS 2025arXiv
0
citations
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
ECCV 2024
0
citations
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025arXiv
0
citations
Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback
NeurIPS 2025arXiv
0
citations
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
NeurIPS 2025arXiv
0
citations
Differentiable extensions with rounding guarantees for combinatorial optimization over permutations
NeurIPS 2025arXiv
0
citations
Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining
ECCV 2024
0
citations
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
NeurIPS 2025arXiv
0
citations
On Inductive Biases That Enable Generalization in Diffusion Transformers
NeurIPS 2025arXiv
0
citations
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
ICLR 2025arXiv
0
citations
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
ICLR 2025arXiv
0
citations
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
ECCV 2024arXiv
0
citations
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
CVPR 2025
0
citations
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
ICLR 2025arXiv
0
citations
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
NeurIPS 2025arXiv
0
citations
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
NeurIPS 2025arXiv
0
citations
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
ICLR 2025arXiv
0
citations
Geometric Algorithms for Neural Combinatorial Optimization with Constraints
NeurIPS 2025arXiv
0
citations
Multi-Agent Collaboration via Evolving Orchestration
NeurIPS 2025arXiv
0
citations
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
ICLR 2025arXiv
0
citations
Don’t Forget the Enjoin: FocalLoRA for Instruction Hierarchical Alignment in Large Language Models
NeurIPS 2025
0
citations
CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving
NeurIPS 2025arXiv
0
citations
MobileNetV4: Universal Models for the Mobile Ecosystem
ECCV 2024arXiv
0
citations