Juncheng Li
28
Papers
42
Total Citations
Papers (28)
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
18
citations
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
ICCV 2025
10
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NeurIPS 2025
6
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
4
citations
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
ICML 2025
4
citations
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
CVPR 2025
0
citations
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
0
citations
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
0
citations
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
0
citations
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
0
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
0
citations
DIEM: Decomposition-Integration Enhancing Multimodal Insights
CVPR 2024
0
citations
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
CVPR 2024
0
citations
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024
0
citations
Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
ICCV 2021arXiv
0
citations
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
ICCV 2023arXiv
0
citations
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023arXiv
0
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
0
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
0
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
0
citations
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
CVPR 2020arXiv
0
citations
Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning
CVPR 2022arXiv
0
citations
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning
CVPR 2023
0
citations
Structure-Preserving Deraining With Residue Channel Prior Guidance
ICCV 2021arXiv
0
citations
Adversarial camera stickers: A physical camera-based attack on deep learning systems
ICML 2019
0
citations
Adversarial Music: Real world Audio Adversary against Wake-word Detection System
NeurIPS 2019
0
citations
Fine-Grained Semantically Aligned Vision-Language Pre-Training
NeurIPS 2022
0
citations
Masked Autoencoders that Listen
NeurIPS 2022
0
citations