Hongtao Xie
19
Papers
132
Total Citations
Papers (19)
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
AAAI 2024arXiv
40
citations
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
CVPR 2024
36
citations
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
CVPR 2025
21
citations
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
CVPR 2025
14
citations
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
CVPR 2025
8
citations
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
ECCV 2024
5
citations
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
NeurIPS 2025
3
citations
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
CVPR 2025
2
citations
GRIP: A Graph-Based Reasoning Instruction Producer
NeurIPS 2025arXiv
2
citations
IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation
AAAI 2025
1
citations
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
CVPR 2024
0
citations
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
ICCV 2025
0
citations
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
ICCV 2025
0
citations
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
ICCV 2025
0
citations
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
ICCV 2025
0
citations
IGD: Instructional Graphic Design with Multimodal Layer Generation
ICCV 2025
0
citations
Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts
ICCV 2025
0
citations
OTE: Exploring Accurate Scene Text Recognition Using One Token
CVPR 2024
0
citations
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
CVPR 2024
0
citations