🧬Representation Learning

Representation Learning

Learning useful data representations

100 papers10,063 total citations

Compare with other topics

Feb '24 — Jan '262218 papers

Top Conferences

ICLR: 34 CVPR: 31 ECCV: 16 AAAI: 11 ICML: 5 NeurIPS: 2

Top Papers

#1

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024arXiv:2402.13616

programmable gradient informationinformation bottleneckreversible functionsgradient path planning+4

2,952

citations

#2

YaRN: Efficient Context Window Extension of Large Language Models

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Balakrishnan Varadarajan, Lemeng Wu et al.

Data Filtering Networks

Alex Fang, Albin Madappally Jose, Amit Jain et al.

Demystifying CLIP Data

Hu Xu, Saining Xie, Xiaoqing Tan et al.

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Le Xue, Ning Yu, Shu Zhang et al.

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

Revisiting Feature Prediction for Learning Visual Representations from Video

Quentin Garrido, Yann LeCun, Michael Rabbat et al.

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024arXiv:2308.07707

machine unlearningselective synaptic dampeningfisher information matrixpost hoc unlearning+3

170

citations

#12

Uni3D: Exploring Unified 3D Representation at Scale

Junsheng Zhou, Jinsheng Wang, Baorui Ma et al.

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang et al.

ICLR 2025arXiv:2407.06460

machine unlearninglanguage modelsprivacy leakageverbatim memorization+4

157

citations

#14

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.

Decoding Natural Images from EEG for Object Recognition

Yonghao Song, Bingchuan Liu, Xiang Li et al.

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Xinlei Chen, Zhuang Liu, Saining Xie et al.

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Jingfeng Wu, Difan Zou, Zixiang Chen et al.

Making Text Embedders Few-Shot Learners

Chaofan Li, Minghao Qin, Shitao Xiao et al.

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

Guosheng Zhao, Chaojun Ni, Xiaofeng Wang et al.

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

Zhiyuan You, Xin Cai, Jinjin Gu et al.

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257

sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3

77

citations

#30

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

Learning to Act without Actions

Dominik Schmidt, Minqi Jiang

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Yizhi Song, Zhifei Zhang, Zhe Lin et al.

Grokking as the transition from lazy to rich training dynamics

Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.

ICLR 2025arXiv:2409.07703

data science agentslarge language modelslarge vision-language modelsdata analysis tasks+4

62

citations

#40

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024arXiv:2312.10726

masked autoencoders3d representation learningpoint cloud pre-trainingtransformer encoder+4

61

citations

#41

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang et al.

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024arXiv:2312.01878

graph neural networksheterogeneous graph representationfew-shot learningprompt learning+4

59

citations

#43

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Guangkai Xu, yongtao ge, Mingyu Liu et al.

ICLR 2025arXiv:2403.06090

diffusion modelsdense perception tasksmonocular depth estimationsurface normal estimation+4

56

citations

#45

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024arXiv:2407.20229

3d gaussian representationsemantic feature lifting3d-aware fine-tuning2d foundation models+4

55

citations

#46

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang et al.

ECCV 2024arXiv:2404.11593

inverse renderingmaterial recoverydiffusion priorsunknown illumination+4

54

citations

#47

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Yue Han, Junwei Zhu, Keke He et al.

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

Shu Yin, Peican Zhu, Lianwei Wu et al.

AAAI 2024arXiv:2312.05739

fake news detectiongraph autoencoderunsupervised learningself-supervised learning+4

53

citations

#50

A Decade's Battle on Dataset Bias: Are We There Yet?

Zhuang Liu, Kaiming He

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Miltiadis (Miltos) Kofinas, Boris Knyazev, Yan Zhang et al.

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng, Bowen Wen, Jonathan Tremblay et al.

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Yufeng Huang, Jiji Tang, Zhuo Chen et al.

AAAI 2024arXiv:2305.06152

scene graph knowledgemulti-modal structured representationsvision-language pre-trainingimage-text matching+3

49

citations

#55

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011

data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3

44

citations

#63

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Linlan Huang, Xusheng Cao, Haori Lu et al.

ECCV 2024arXiv:2407.14143

class-incremental learningvision-language pre-trainingrepresentation adjustmentparameter fusion+3

41

citations

#67

Does CLIP’s generalization performance mainly stem from high train-test similarity?

Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak et al.

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025arXiv:2504.01017

visual self-supervised learningcontrastive language-image pretrainingmultimodal representation learningvision encoders+2

39

citations

#69

PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters

Jingyu Chen, Runlin Lei, Zhewei Wei

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

Disentangled Prompt Representation for Domain Generalization

De Cheng, Zhipeng Xu, XINYANG JIANG et al.

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Quan Zhang, Xiaoyu Liu, Wei Li et al.

Do Generated Data Always Help Contrastive Learning?

Yifei Wang, Jizhe Zhang, Yisen Wang

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

Yunpeng Qu, Kun Yuan, Kai Zhao et al.

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification

Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.

AAAI 2024arXiv:2312.16797

person re-identificationcross-modal alignmentprompt learningattribute descriptions+3

33

citations

#78

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Spencer Frei, Niladri Chatterji, Peter L. Bartlett

Rethinking Graph Masked Autoencoders through Alignment and Uniformity

Liang Wang, Xiang Tao, Qiang Liu et al.

AAAI 2024arXiv:2402.07225

graph masked autoencodersgraph contrastive learningself-supervised learningalignment and uniformity+3

32

citations

#80

Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space

Chengyang Hu, Ke-Yue Zhang, Taiping Yao et al.

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Yubin Xiao, Di Wang, Boyang Li et al.

AAAI 2024arXiv:2312.12469

knowledge distillationautoregressive modelsnon-autoregressive modelsvehicle routing problems+2

31

citations

#83

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024arXiv:2407.20337

contrastive learningdeepfake detectiondiffusion modelsglobal-local similarities+3

31

citations

#85

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

Self-Supervised Facial Representation Learning with Facial Region Awareness

Zheng Gao, Ioannis Patras

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Ruihai Wu, Haoran Lu, Yiyan Wang et al.

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Yushuo Chen, Zerong Zheng, Zhe Li et al.

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

Liren He, Zhengkai Jiang, Jinlong Peng et al.

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NeurIPS 2025arXiv:2507.01467

diffusion modelsrepresentation entanglementdenoising networksimage generation+3

27

citations

#95

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Yizhe Xiong, Hui Chen, Tianxiang Hao et al.

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Guozheng Ma, Lu Li, Sen Zhang et al.

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu et al.

DTL: Disentangled Transfer Learning for Visual Recognition

Minghao Fu, Ke Zhu, Jianxin Wu

AAAI 2024arXiv:2312.07856

parameter-efficient transfer learningvisual recognitiongpu memory reductiondisentangled representation learning+4

25

citations

#99

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Josef Kittler et al.

AAAI 2024arXiv:2309.05834

skeleton-based action recognitioncontrastive learningspatiotemporal disentanglementmasked image modeling+4

24

citations

#100

Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking

Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu

CVPR 2024

24

citations

Representation Learning

Top Conferences

Related Topics (Representation Learning)

Top Papers

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

YaRN: Efficient Context Window Extension of Large Language Models

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Data Filtering Networks

Demystifying CLIP Data

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Revisiting Feature Prediction for Learning Visual Representations from Video

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Uni3D: Exploring Unified 3D Representation at Scale

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Linearity of Relation Decoding in Transformer Language Models

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Decoding Natural Images from EEG for Object Recognition

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Making Text Embedders Few-Shot Learners

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

Towards Foundation Models for Knowledge Graph Reasoning

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

RGBD GS-ICP SLAM

Learning to Act without Actions

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Grokking as the transition from lazy to rich training dynamics

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

A Decade's Battle on Dataset Bias: Are We There Yet?

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Data Shapley in One Training Run

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Does CLIP’s generalization performance mainly stem from high train-test similarity?

Scaling Language-Free Visual Representation Learning

PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters

Sonata: Self-Supervised Learning of Reliable Point Representations

Combining Induction and Transduction for Abstract Reasoning

Disentangled Prompt Representation for Domain Generalization

Synthetic continued pretraining

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Do Generated Data Always Help Contrastive Learning?

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution