Chen Sun

52
Papers
352
Total Citations

Papers (52)

Actor-centric Relation Network

ECCV 2018
232
citations

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

ICLR 2024
81
citations

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

CVPR 2025
20
citations

Solving New Tasks by Adapting Internet Video Knowledge

ICLR 2025
12
citations

Dense Video Object Captioning from Disjoint Supervision

ICLR 2025arXiv
7
citations

Self-Correcting Self-Consuming Loops for Generative Model Training

ICML 2024
0
citations

Potential Based Diffusion Motion Planning

ICML 2024
0
citations

ProNet: Learning to Propose Object-Specific Boxes for Cascaded Neural Networks

CVPR 2016
0
citations

MotiF: Making Text Count in Image Animation with Motion Focal Loss

CVPR 2025
0
citations

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

CVPR 2018arXiv
0
citations

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions

CVPR 2018arXiv
0
citations

The INaturalist Species Classification and Detection Dataset

CVPR 2018arXiv
0
citations

Relational Action Forecasting

CVPR 2019
0
citations

Composing Text and Image for Image Retrieval - an Empirical Odyssey

CVPR 2019
0
citations

Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior

CVPR 2019
0
citations

DNU: Deep Non-Local Unrolling for Computational Spectral Imaging

CVPR 2020
0
citations

Speech2Action: Cross-Modal Supervision for Action Recognition

CVPR 2020arXiv
0
citations

VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation

CVPR 2020arXiv
0
citations

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

CVPR 2021
0
citations

Multiview Transformers for Video Recognition

CVPR 2022arXiv
0
citations

REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory

CVPR 2023arXiv
0
citations

How Can Objects Help Action Recognition?

CVPR 2023
0
citations

Automatic Concept Discovery From Parallel Text and Visual Corpora

ICCV 2015
0
citations

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

ICCV 2017arXiv
0
citations

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

ICCV 2017arXiv
0
citations

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

ICCV 2017arXiv
0
citations

TALL: Temporal Activity Localization via Language Query

ICCV 2017arXiv
0
citations

VideoBERT: A Joint Model for Video and Language Representation Learning

ICCV 2019
0
citations

Composable Augmentation Encoding for Video Representation Learning

ICCV 2021arXiv
0
citations

DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets

ICCV 2021arXiv
0
citations

Episodic Transformer for Vision-and-Language Navigation

ICCV 2021arXiv
0
citations

ViViT: A Video Vision Transformer

ICCV 2021arXiv
0
citations

Learning Temporal Dynamics From Cycles in Narrated Video

ICCV 2021arXiv
0
citations

Unified Graph Structured Models for Video Understanding

ICCV 2021arXiv
0
citations

Multi-modal Transformer for Video Retrieval

ECCV 2020
0
citations

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

ECCV 2020
0
citations

Learning Audio-Video Modalities from Image Captions

ECCV 2022
0
citations

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

ECCV 2022
0
citations

Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors

CVPR 2017arXiv
0
citations

Motion Prompting: Controlling Video Generation with Motion Trajectories

CVPR 2025
0
citations

How Can Objects Help Video-Language Understanding?

ICCV 2025
0
citations

End-to-End Spatio-Temporal Action Localisation with Video Transformers

CVPR 2024
0
citations

Pixel-Aligned Language Model

CVPR 2024
0
citations

Unsupervised learning of object structure and dynamics from videos

NeurIPS 2019
0
citations

What Makes for Good Views for Contrastive Learning?

NeurIPS 2020
0
citations

Discrete-Valued Neural Communication

NeurIPS 2021
0
citations

Attention Bottlenecks for Multimodal Fusion

NeurIPS 2021
0
citations

Trajectory balance: Improved credit assignment in GFlowNets

NeurIPS 2022
0
citations

AVIS: Autonomous Visual Information Seeking with Large Language Model Agent

NeurIPS 2023
0
citations

Does Visual Pretraining Help End-to-End Reasoning?

NeurIPS 2023
0
citations

Goal-Conditioned Predictive Coding for Offline Reinforcement Learning

NeurIPS 2023
0
citations

Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL

NeurIPS 2023
0
citations