Jiebo Luo
81
Papers
324
Total Citations
Papers (81)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
AAAI 2024arXiv
110
citations
Adaptive Offline Quintuplet Loss for Image-Text Matching
ECCV 2020
75
citations
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
47
citations
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
NeurIPS 2025
25
citations
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
CVPR 2025
17
citations
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
ECCV 2024
14
citations
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
CVPR 2025
12
citations
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
AAAI 2025
10
citations
Mixture of Weak and Strong Experts on Graphs
ICLR 2024
10
citations
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
ICCV 2025
4
citations
Improving Pairwise Ranking for Multi-Label Image Classification
CVPR 2017arXiv
0
citations
Deep Multimodal Representation Learning From Temporal Data
CVPR 2017arXiv
0
citations
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
CVPR 2018arXiv
0
citations
VizWiz Grand Challenge: Answering Visual Questions From Blind People
CVPR 2018arXiv
0
citations
DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
CVPR 2018arXiv
0
citations
End-to-End Convolutional Semantic Embeddings
CVPR 2018
0
citations
Gaussian Temporal Awareness Networks for Action Localization
CVPR 2019
0
citations
Spatio-Temporal Video Re-Localization by Warp LSTM
CVPR 2019
0
citations
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data
CVPR 2019
0
citations
Attentive Relational Networks for Mapping Images to Scene Graphs
CVPR 2019
0
citations
Unsupervised Image Captioning
CVPR 2019
0
citations
Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition
CVPR 2019
0
citations
Foreground-Aware Image Inpainting
CVPR 2019
0
citations
Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning
CVPR 2019
0
citations
DuDoNet: Dual Domain Network for CT Metal Artifact Reduction
CVPR 2019
0
citations
Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation
CVPR 2019
0
citations
Fine-Grained Image-to-Image Transformation Towards Visual Recognition
CVPR 2020arXiv
0
citations
On Vocabulary Reliance in Scene Text Recognition
CVPR 2020arXiv
0
citations
Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection
CVPR 2020arXiv
0
citations
Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning
CVPR 2020
0
citations
TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning
CVPR 2020arXiv
0
citations
ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
CVPR 2021arXiv
0
citations
Structured Multi-Level Interaction Network for Video Moment Localization via Language Query
CVPR 2021
0
citations
Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship
CVPR 2021
0
citations
Group-aware Label Transfer for Domain Adaptive Person Re-identification
CVPR 2021arXiv
0
citations
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
CVPR 2021arXiv
0
citations
Localized Adversarial Domain Generalization
CVPR 2022arXiv
0
citations
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
CVPR 2022
0
citations
Stand-Alone Inter-Frame Attention in Video Models
CVPR 2022
0
citations
Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning
CVPR 2022arXiv
0
citations
Automatic Relation-Aware Graph Network Proliferation
CVPR 2022arXiv
0
citations
AnchorFormer: Point Cloud Completion From Discriminative Nodes
CVPR 2023
0
citations
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
CVPR 2023arXiv
0
citations
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
CVPR 2023
0
citations
Meta-Causal Learning for Single Domain Generalization
CVPR 2023arXiv
0
citations
Stare at What You See: Masked Image Modeling Without Reconstruction
CVPR 2023arXiv
0
citations
Semantic Video Entity Linking Based on Visual Content and Metadata
ICCV 2015
0
citations
Learning From Noisy Labels With Distillation
ICCV 2017arXiv
0
citations
Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition
ICCV 2017
0
citations
A Fast and Accurate One-Stage Approach to Visual Grounding
ICCV 2019
0
citations
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
ICCV 2019
0
citations
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025
0
citations
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
ICCV 2021arXiv
0
citations
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
ICCV 2021arXiv
0
citations
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
ICCV 2021arXiv
0
citations
Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
ICCV 2021arXiv
0
citations
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3
ICCV 2023
0
citations
Spatial-Aware Token for Weakly Supervised Object Localization
ICCV 2023arXiv
0
citations
Grounding 3D Object Affordance from 2D Interactions in Images
ICCV 2023arXiv
0
citations
Learning to Localize Actions from Moments
ECCV 2020
0
citations
TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
ECCV 2020
0
citations
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
ECCV 2020
0
citations
Improving One-stage Visual Grounding by Recursive Sub-query Construction
ECCV 2020
0
citations
Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision
ECCV 2020
0
citations
Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification
ECCV 2020
0
citations
Image Inpainting with Cascaded Modulation GAN and Object-Aware Training
ECCV 2022
0
citations
Large-Scale Tag-Based Font Retrieval With Generative Feature Learning
ICCV 2019
0
citations
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
ICCV 2025
0
citations
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
ICCV 2025
0
citations
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
ICCV 2025
0
citations
Aligning Global Semantics and Local Textures in Generative Video Enhancement
ICCV 2025
0
citations
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
CVPR 2024
0
citations
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
CVPR 2024
0
citations
Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection
CVPR 2015
0
citations
TGIF: A New Dataset and Benchmark on Animated GIF Description
CVPR 2016
0
citations
Image Captioning With Semantic Attention
CVPR 2016
0
citations
Learning Deep Bilinear Transformation for Fine-grained Image Representation
NeurIPS 2019
0
citations
Learning Semantic-aware Normalization for Generative Adversarial Networks
NeurIPS 2020
0
citations
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
NeurIPS 2021
0
citations
Multi-modal Dependency Tree for Video Captioning
NeurIPS 2021
0
citations
Wyze Rule: Federated Rule Dataset for Rule Recommendation Benchmarking
NeurIPS 2023
0
citations