Jiebo Luo

81
Papers
324
Total Citations

Papers (81)

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

AAAI 2024arXiv
110
citations

Adaptive Offline Quintuplet Loss for Image-Text Matching

ECCV 2020
75
citations

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

AAAI 2025
47
citations

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

NeurIPS 2025
25
citations

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

CVPR 2025
17
citations

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

ECCV 2024
14
citations

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

CVPR 2025
12
citations

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion

AAAI 2025
10
citations

Mixture of Weak and Strong Experts on Graphs

ICLR 2024
10
citations

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

ICCV 2025
4
citations

Improving Pairwise Ranking for Multi-Label Image Classification

CVPR 2017arXiv
0
citations

Deep Multimodal Representation Learning From Temporal Data

CVPR 2017arXiv
0
citations

Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks

CVPR 2018arXiv
0
citations

VizWiz Grand Challenge: Answering Visual Questions From Blind People

CVPR 2018arXiv
0
citations

DOTA: A Large-Scale Dataset for Object Detection in Aerial Images

CVPR 2018arXiv
0
citations

End-to-End Convolutional Semantic Embeddings

CVPR 2018
0
citations

Gaussian Temporal Awareness Networks for Action Localization

CVPR 2019
0
citations

Spatio-Temporal Video Re-Localization by Warp LSTM

CVPR 2019
0
citations

AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data

CVPR 2019
0
citations

Attentive Relational Networks for Mapping Images to Scene Graphs

CVPR 2019
0
citations

Unsupervised Image Captioning

CVPR 2019
0
citations

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

CVPR 2019
0
citations

Foreground-Aware Image Inpainting

CVPR 2019
0
citations

Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning

CVPR 2019
0
citations

DuDoNet: Dual Domain Network for CT Metal Artifact Reduction

CVPR 2019
0
citations

Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation

CVPR 2019
0
citations

Fine-Grained Image-to-Image Transformation Towards Visual Recognition

CVPR 2020arXiv
0
citations

On Vocabulary Reliance in Scene Text Recognition

CVPR 2020arXiv
0
citations

Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection

CVPR 2020arXiv
0
citations

Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning

CVPR 2020
0
citations

TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning

CVPR 2020arXiv
0
citations

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

CVPR 2021arXiv
0
citations

Structured Multi-Level Interaction Network for Video Moment Localization via Language Query

CVPR 2021
0
citations

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship

CVPR 2021
0
citations

Group-aware Label Transfer for Domain Adaptive Person Re-identification

CVPR 2021arXiv
0
citations

TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption

CVPR 2021arXiv
0
citations

Localized Adversarial Domain Generalization

CVPR 2022arXiv
0
citations

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing

CVPR 2022
0
citations

Stand-Alone Inter-Frame Attention in Video Models

CVPR 2022
0
citations

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

CVPR 2022arXiv
0
citations

Automatic Relation-Aware Graph Network Proliferation

CVPR 2022arXiv
0
citations

AnchorFormer: Point Cloud Completion From Discriminative Nodes

CVPR 2023
0
citations

QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity

CVPR 2023arXiv
0
citations

Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning

CVPR 2023
0
citations

Meta-Causal Learning for Single Domain Generalization

CVPR 2023arXiv
0
citations

Stare at What You See: Masked Image Modeling Without Reconstruction

CVPR 2023arXiv
0
citations

Semantic Video Entity Linking Based on Visual Content and Metadata

ICCV 2015
0
citations

Learning From Noisy Labels With Distillation

ICCV 2017arXiv
0
citations

Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition

ICCV 2017
0
citations

A Fast and Accurate One-Stage Approach to Visual Grounding

ICCV 2019
0
citations

Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning

ICCV 2019
0
citations

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

CVPR 2025
0
citations

Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

ICCV 2021arXiv
0
citations

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

ICCV 2021arXiv
0
citations

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

ICCV 2021arXiv
0
citations

Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning

ICCV 2021arXiv
0
citations

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3

ICCV 2023
0
citations

Spatial-Aware Token for Weakly Supervised Object Localization

ICCV 2023arXiv
0
citations

Grounding 3D Object Affordance from 2D Interactions in Images

ICCV 2023arXiv
0
citations

Learning to Localize Actions from Moments

ECCV 2020
0
citations

TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

ECCV 2020
0
citations

Structured Landmark Detection via Topology-Adapting Deep Graph Learning

ECCV 2020
0
citations

Improving One-stage Visual Grounding by Recursive Sub-query Construction

ECCV 2020
0
citations

Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision

ECCV 2020
0
citations

Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

ECCV 2020
0
citations

Image Inpainting with Cascaded Modulation GAN and Object-Aware Training

ECCV 2022
0
citations

Large-Scale Tag-Based Font Retrieval With Generative Feature Learning

ICCV 2019
0
citations

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

ICCV 2025
0
citations

Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training

ICCV 2025
0
citations

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

ICCV 2025
0
citations

Aligning Global Semantics and Local Textures in Generative Video Enhancement

ICCV 2025
0
citations

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

CVPR 2024
0
citations

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

CVPR 2024
0
citations

Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection

CVPR 2015
0
citations

TGIF: A New Dataset and Benchmark on Animated GIF Description

CVPR 2016
0
citations

Image Captioning With Semantic Attention

CVPR 2016
0
citations

Learning Deep Bilinear Transformation for Fine-grained Image Representation

NeurIPS 2019
0
citations

Learning Semantic-aware Normalization for Generative Adversarial Networks

NeurIPS 2020
0
citations

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

NeurIPS 2021
0
citations

Multi-modal Dependency Tree for Video Captioning

NeurIPS 2021
0
citations

Wyze Rule: Federated Rule Dataset for Rule Recommendation Benchmarking

NeurIPS 2023
0
citations