Juncheng Li

28
Papers
42
Total Citations

Papers (28)

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

CVPR 2025
18
citations

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

ICCV 2025
10
citations

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

NeurIPS 2025
6
citations

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

CVPR 2025
4
citations

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark

ICML 2025
4
citations

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

CVPR 2025
0
citations

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

ICCV 2025
0
citations

STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

CVPR 2025
0
citations

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

ICCV 2025
0
citations

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

ICCV 2025
0
citations

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

CVPR 2025
0
citations

DIEM: Decomposition-Integration Enhancing Multimodal Insights

CVPR 2024
0
citations

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

CVPR 2024
0
citations

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

CVPR 2024
0
citations

Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference

ICCV 2021arXiv
0
citations

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

ICCV 2023arXiv
0
citations

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

ICCV 2023arXiv
0
citations

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

ICCV 2025
0
citations

Auto-Encoding Morph-Tokens for Multimodal LLM

ICML 2024
0
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

ICML 2024
0
citations

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

CVPR 2020arXiv
0
citations

Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning

CVPR 2022arXiv
0
citations

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

CVPR 2023
0
citations

Structure-Preserving Deraining With Residue Channel Prior Guidance

ICCV 2021arXiv
0
citations

Adversarial camera stickers: A physical camera-based attack on deep learning systems

ICML 2019
0
citations

Adversarial Music: Real world Audio Adversary against Wake-word Detection System

NeurIPS 2019
0
citations

Fine-Grained Semantically Aligned Vision-Language Pre-Training

NeurIPS 2022
0
citations

Masked Autoencoders that Listen

NeurIPS 2022
0
citations