Chen Sun
52
Papers
352
Total Citations
Papers (52)
Actor-centric Relation Network
ECCV 2018
232
citations
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
ICLR 2024
81
citations
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
CVPR 2025
20
citations
Solving New Tasks by Adapting Internet Video Knowledge
ICLR 2025
12
citations
Dense Video Object Captioning from Disjoint Supervision
ICLR 2025arXiv
7
citations
Self-Correcting Self-Consuming Loops for Generative Model Training
ICML 2024
0
citations
Potential Based Diffusion Motion Planning
ICML 2024
0
citations
ProNet: Learning to Propose Object-Specific Boxes for Cascaded Neural Networks
CVPR 2016
0
citations
MotiF: Making Text Count in Image Animation with Motion Focal Loss
CVPR 2025
0
citations
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
CVPR 2018arXiv
0
citations
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
CVPR 2018arXiv
0
citations
The INaturalist Species Classification and Detection Dataset
CVPR 2018arXiv
0
citations
Relational Action Forecasting
CVPR 2019
0
citations
Composing Text and Image for Image Retrieval - an Empirical Odyssey
CVPR 2019
0
citations
Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior
CVPR 2019
0
citations
DNU: Deep Non-Local Unrolling for Computational Spectral Imaging
CVPR 2020
0
citations
Speech2Action: Cross-Modal Supervision for Action Recognition
CVPR 2020arXiv
0
citations
VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation
CVPR 2020arXiv
0
citations
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
CVPR 2021
0
citations
Multiview Transformers for Video Recognition
CVPR 2022arXiv
0
citations
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
CVPR 2023arXiv
0
citations
How Can Objects Help Action Recognition?
CVPR 2023
0
citations
Automatic Concept Discovery From Parallel Text and Visual Corpora
ICCV 2015
0
citations
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
ICCV 2017arXiv
0
citations
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
ICCV 2017arXiv
0
citations
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
ICCV 2017arXiv
0
citations
TALL: Temporal Activity Localization via Language Query
ICCV 2017arXiv
0
citations
VideoBERT: A Joint Model for Video and Language Representation Learning
ICCV 2019
0
citations
Composable Augmentation Encoding for Video Representation Learning
ICCV 2021arXiv
0
citations
DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets
ICCV 2021arXiv
0
citations
Episodic Transformer for Vision-and-Language Navigation
ICCV 2021arXiv
0
citations
ViViT: A Video Vision Transformer
ICCV 2021arXiv
0
citations
Learning Temporal Dynamics From Cycles in Narrated Video
ICCV 2021arXiv
0
citations
Unified Graph Structured Models for Video Understanding
ICCV 2021arXiv
0
citations
Multi-modal Transformer for Video Retrieval
ECCV 2020
0
citations
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
ECCV 2020
0
citations
Learning Audio-Video Modalities from Image Captions
ECCV 2022
0
citations
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
ECCV 2022
0
citations
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
CVPR 2017arXiv
0
citations
Motion Prompting: Controlling Video Generation with Motion Trajectories
CVPR 2025
0
citations
How Can Objects Help Video-Language Understanding?
ICCV 2025
0
citations
End-to-End Spatio-Temporal Action Localisation with Video Transformers
CVPR 2024
0
citations
Pixel-Aligned Language Model
CVPR 2024
0
citations
Unsupervised learning of object structure and dynamics from videos
NeurIPS 2019
0
citations
What Makes for Good Views for Contrastive Learning?
NeurIPS 2020
0
citations
Discrete-Valued Neural Communication
NeurIPS 2021
0
citations
Attention Bottlenecks for Multimodal Fusion
NeurIPS 2021
0
citations
Trajectory balance: Improved credit assignment in GFlowNets
NeurIPS 2022
0
citations
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
NeurIPS 2023
0
citations
Does Visual Pretraining Help End-to-End Reasoning?
NeurIPS 2023
0
citations
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
NeurIPS 2023
0
citations
Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL
NeurIPS 2023
0
citations