Most Cited 2024 Paper by Hongsheng LI Papers

14 papers found

#1

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024posterarXiv:2403.14624
487
citations
#2

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Linjiang Huang, Rongyao Fang, Aiping Zhang et al.

ECCV 2024posterarXiv:2403.12963
51
citations
#3

Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding

YIWEN TANG, Renrui Zhang, Jiaming Liu et al.

ECCV 2024poster
19
citations
#4

Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos

Keqiang Sun, Dori Litvak, Yunzhi Zhang et al.

ECCV 2024posterarXiv:2312.13604
10
citations
#5

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Fu-Yun Wang, Zhaoyang Huang, Qiang Ma et al.

ECCV 2024poster
9
citations
#6

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Yijin Li, Yichen Shen, Zhaoyang Huang et al.

ECCV 2024posterarXiv:2410.20451
7
citations
#7

Unmasking Bias in Diffusion Model Training

Hu Yu, Li Shen, Jie Huang et al.

ECCV 2024posterarXiv:2310.08442
7
citations
#8

Delving Deep into Engagement Prediction of Short Videos

dasong Li, Wenjie Li, Baili Lu et al.

ECCV 2024posterarXiv:2410.00289
6
citations
#9

nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

Benjin Zhu, zhe wang, Hongsheng LI

ECCV 2024poster
5
citations
#10

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models

Ziyi Lin, Dongyang Liu, Renrui Zhang et al.

ECCV 2024poster
#11

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.

ECCV 2024posterarXiv:2403.13745
#12

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks

Manyuan Zhang, Guanglu Song, Xiaoyu Shi et al.

ECCV 2024poster
#13

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.

ECCV 2024posterarXiv:2405.00760
#14

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Haiyang Wang, Hao Tang, Li Jiang et al.

ECCV 2024posterarXiv:2403.09394