Yibing Song

39
Papers
361
Total Citations

Papers (39)

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

ICCV 2025
338
citations

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

ICLR 2024
17
citations

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

NeurIPS 2025
3
citations

Re-Aligning Language to Visual Objects with an Agentic Workflow

ICLR 2025
3
citations

Advancing Textual Prompt Learning with Anchored Attributes

ICCV 2025
0
citations

Image Correction via Deep Reciprocating HDR Transformation

CVPR 2018arXiv
0
citations

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

CVPR 2018
0
citations

VITAL: VIsual Tracking via Adversarial Learning

CVPR 2018arXiv
0
citations

MVF-Net: Multi-View 3D Face Morphable Model Regression

CVPR 2019
0
citations

Unsupervised Deep Tracking

CVPR 2019
0
citations

VideoMoCo: Contrastive Video Representation Learning With Temporally Adversarial Examples

CVPR 2021arXiv
0
citations

Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On

CVPR 2021arXiv
0
citations

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

CVPR 2021
0
citations

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

CVPR 2021arXiv
0
citations

Parser-Free Virtual Try-On via Distilling Appearance Flows

CVPR 2021arXiv
0
citations

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls

CVPR 2021arXiv
0
citations

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

CVPR 2021arXiv
0
citations

Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

CVPR 2022arXiv
0
citations

Improved Test-Time Adaptation for Domain Generalization

CVPR 2023arXiv
0
citations

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

CVPR 2023arXiv
0
citations

Advancing Visual Grounding With Scene Knowledge: Benchmark and Method

CVPR 2023
0
citations

CREST: Convolutional Residual Learning for Visual Tracking

ICCV 2017arXiv
0
citations

Domain Generalization via Rationale Invariance

ICCV 2023arXiv
0
citations

Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation

ICCV 2023
0
citations

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

ICCV 2023arXiv
0
citations

Efficient Video Action Detection with Token Dropout and Context Refinement

ICCV 2023arXiv
0
citations

DiffusionDet: Diffusion Model for Object Detection

ICCV 2023arXiv
0
citations

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

ECCV 2020
0
citations

Rethinking Image Deraining via Rain Streaks and Vapors

ECCV 2020
0
citations

Robust Tracking against Adversarial Attacks

ECCV 2020
0
citations

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

CVPR 2025
0
citations

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

CVPR 2025
0
citations

AvatarArtist: Open-Domain 4D Avatarization

CVPR 2025
0
citations

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

CVPR 2025
0
citations

Deep Attentive Tracking via Reciprocative Learning

NeurIPS 2018
0
citations

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

NeurIPS 2022
0
citations

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

NeurIPS 2022
0
citations

OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training

NeurIPS 2022
0
citations

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

NeurIPS 2022
0
citations