Tong Lu

28
Papers
2,560
Total Citations

Papers (28)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

CVPR 2024
2,210
citations

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

CVPR 2024
169
citations

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

ICLR 2024
118
citations

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

ICLR 2025
39
citations

Docopilot: Improving Multimodal Models for Document-Level Understanding

CVPR 2025
14
citations

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs

NeurIPS 2025
10
citations

RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation

CVPR 2024
0
citations

Temporal Action Localization by Structured Maximal Sums

CVPR 2017arXiv
0
citations

Shape Robust Text Detection With Progressive Scale Expansion Network

CVPR 2019
0
citations

InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions

CVPR 2023arXiv
0
citations

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network

ICCV 2019
0
citations

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions

ICCV 2021arXiv
0
citations

TAM: Temporal Adaptive Module for Video Recognition

ICCV 2021arXiv
0
citations

Adaptive Graph Convolution for Point Cloud Analysis

ICCV 2021arXiv
0
citations

Memory-and-Anticipation Transformer for Online Action Understanding

ICCV 2023arXiv
0
citations

FB-BEV: BEV Representation from Forward-Backward View Transformations

ICCV 2023
0
citations

DDP: Diffusion Model for Dense Visual Prediction

ICCV 2023arXiv
0
citations

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

ECCV 2020
0
citations

SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer

ECCV 2022
0
citations

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

ECCV 2022
0
citations

Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers

CVPR 2022arXiv
0
citations

MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration

ICCV 2025
0
citations

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

AAAI 2025
0
citations

AVSegFormer: Audio-Visual Segmentation with Transformer

AAAI 2024arXiv
0
citations

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

AAAI 2024
0
citations

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024
0
citations

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

NeurIPS 2021
0
citations

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

NeurIPS 2023
0
citations