Yibing Song
39
Papers
361
Total Citations
Papers (39)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
ICCV 2025
338
citations
Image Inpainting via Iteratively Decoupled Probabilistic Modeling
ICLR 2024
17
citations
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step
NeurIPS 2025
3
citations
Re-Aligning Language to Visual Objects with an Agentic Workflow
ICLR 2025
3
citations
Advancing Textual Prompt Learning with Anchored Attributes
ICCV 2025
0
citations
Image Correction via Deep Reciprocating HDR Transformation
CVPR 2018arXiv
0
citations
Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks
CVPR 2018
0
citations
VITAL: VIsual Tracking via Adversarial Learning
CVPR 2018arXiv
0
citations
MVF-Net: Multi-View 3D Face Morphable Model Regression
CVPR 2019
0
citations
Unsupervised Deep Tracking
CVPR 2019
0
citations
VideoMoCo: Contrastive Video Representation Learning With Temporally Adversarial Examples
CVPR 2021arXiv
0
citations
Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On
CVPR 2021arXiv
0
citations
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
CVPR 2021
0
citations
ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
CVPR 2021arXiv
0
citations
Parser-Free Virtual Try-On via Distilling Appearance Flows
CVPR 2021arXiv
0
citations
DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls
CVPR 2021arXiv
0
citations
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
CVPR 2021arXiv
0
citations
Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection
CVPR 2022arXiv
0
citations
Improved Test-Time Adaptation for Domain Generalization
CVPR 2023arXiv
0
citations
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
CVPR 2023arXiv
0
citations
Advancing Visual Grounding With Scene Knowledge: Benchmark and Method
CVPR 2023
0
citations
CREST: Convolutional Residual Learning for Visual Tracking
ICCV 2017arXiv
0
citations
Domain Generalization via Rationale Invariance
ICCV 2023arXiv
0
citations
Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation
ICCV 2023
0
citations
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
ICCV 2023arXiv
0
citations
Efficient Video Action Detection with Token Dropout and Context Refinement
ICCV 2023arXiv
0
citations
DiffusionDet: Diffusion Model for Object Detection
ICCV 2023arXiv
0
citations
Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations
ECCV 2020
0
citations
Rethinking Image Deraining via Rain Streaks and Vapors
ECCV 2020
0
citations
Robust Tracking against Adversarial Attacks
ECCV 2020
0
citations
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
CVPR 2025
0
citations
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
CVPR 2025
0
citations
AvatarArtist: Open-Domain 4D Avatarization
CVPR 2025
0
citations
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
CVPR 2025
0
citations
Deep Attentive Tracking via Reciprocative Learning
NeurIPS 2018
0
citations
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
NeurIPS 2022
0
citations
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
NeurIPS 2022
0
citations
OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training
NeurIPS 2022
0
citations
One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations
NeurIPS 2022
0
citations