Most Cited 2024 Paper by Hongsheng LI Papers
14 papers found
Conference
#1
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.
ECCV 2024posterarXiv:2403.14624
487
citations
#2
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
ECCV 2024posterarXiv:2403.12963
51
citations
#3
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
YIWEN TANG, Renrui Zhang, Jiaming Liu et al.
ECCV 2024poster
19
citations
#4
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
Keqiang Sun, Dori Litvak, Yunzhi Zhang et al.
ECCV 2024posterarXiv:2312.13604
10
citations
#5
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
Fu-Yun Wang, Zhaoyang Huang, Qiang Ma et al.
ECCV 2024poster
9
citations
#6
BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events
Yijin Li, Yichen Shen, Zhaoyang Huang et al.
ECCV 2024posterarXiv:2410.20451
7
citations
#7
Unmasking Bias in Diffusion Model Training
Hu Yu, Li Shen, Jie Huang et al.
ECCV 2024posterarXiv:2310.08442
7
citations
#8
Delving Deep into Engagement Prediction of Short Videos
dasong Li, Wenjie Li, Baili Lu et al.
ECCV 2024posterarXiv:2410.00289
6
citations
#9
nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
Benjin Zhu, zhe wang, Hongsheng LI
ECCV 2024poster
5
citations
#10
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
Ziyi Lin, Dongyang Liu, Renrui Zhang et al.
ECCV 2024poster
#11
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.
ECCV 2024posterarXiv:2403.13745
#12
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
Manyuan Zhang, Guanglu Song, Xiaoyu Shi et al.
ECCV 2024poster
#13
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
ECCV 2024posterarXiv:2405.00760
#14
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang, Hao Tang, Li Jiang et al.
ECCV 2024posterarXiv:2403.09394