Tong Lu
28
Papers
2,560
Total Citations
Papers (28)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
2,210
citations
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
CVPR 2024
169
citations
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
118
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
39
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
14
citations
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
NeurIPS 2025
10
citations
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
CVPR 2024
0
citations
Temporal Action Localization by Structured Maximal Sums
CVPR 2017arXiv
0
citations
Shape Robust Text Detection With Progressive Scale Expansion Network
CVPR 2019
0
citations
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023arXiv
0
citations
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network
ICCV 2019
0
citations
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021arXiv
0
citations
TAM: Temporal Adaptive Module for Video Recognition
ICCV 2021arXiv
0
citations
Adaptive Graph Convolution for Point Cloud Analysis
ICCV 2021arXiv
0
citations
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023arXiv
0
citations
FB-BEV: BEV Representation from Forward-Backward View Transformations
ICCV 2023
0
citations
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023arXiv
0
citations
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
0
citations
SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer
ECCV 2022
0
citations
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
0
citations
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022arXiv
0
citations
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
ICCV 2025
0
citations
Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation
AAAI 2025
0
citations
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024arXiv
0
citations
CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers
AAAI 2024
0
citations
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
0
citations
Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution
NeurIPS 2021
0
citations
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023
0
citations