🧬Generative Models

Energy-Based Models

EBMs and contrastive divergence training

100 papers6,306 total citations
Compare with other topics
Feb '24 Jan '26886 papers
Also includes: energy-based models, ebm, contrastive divergence, score matching

Top Papers

#1

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024
996
citations
#2

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin, Zhelun Shi, Jiwen Yu et al.

ICML 2025
806
citations
#3

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick et al.

ICML 2025
329
citations
#4

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

ICLR 2024
187
citations
#5

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

CVPR 2024
176
citations
#6

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.

ICLR 2025arXiv:2406.04770
large language modelsautomated evaluation frameworkreal-world user queriespairwise comparison metrics+3
142
citations
#7

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng et al.

ICLR 2025
114
citations
#8

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.

ICLR 2024
107
citations
#9

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.

ICML 2025
100
citations
#10

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024arXiv:2308.10045
text-based person searchcontrastive language image pretrainingcross-modal retrievalvision-language pre-training+3
94
citations
#11

DEIM: DETR with Improved Matching for Fast Convergence

Shihua Huang, Zhichao Lu, Xiaodong Cun et al.

CVPR 2025
93
citations
#12

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Yuang Peng, Yuxin Cui, Haomiao Tang et al.

ICLR 2025
91
citations
#13

Towards Open-ended Visual Quality Comparison

Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.

ECCV 2024
91
citations
#14

When Attention Sink Emerges in Language Models: An Empirical View

Xiangming Gu, Tianyu Pang, Chao Du et al.

ICLR 2025arXiv:2410.10781
attention sink phenomenonlanguage model pre-trainingsoftmax normalizationkey biases+4
90
citations
#15

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Yiding Lu, Yijie Lin, Mouxing Yang et al.

AAAI 2024arXiv:2308.11164
multi-view clusteringcontrastive learningfalse negative issuerandom walks+4
90
citations
#16

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024arXiv:2402.16897
multi-view learningconflictive instancesevidential learningopinion aggregation+2
88
citations
#17

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024
84
citations
#18

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.

ICLR 2025
81
citations
#19

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

ICLR 2024
79
citations
#20

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024
77
citations
#21

MMTEB: Massive Multilingual Text Embedding Benchmark

Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.

ICLR 2025arXiv:2502.13595
text embedding evaluationmultilingual benchmarksinstruction following taskslong-document retrieval+4
74
citations
#22

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.

ICLR 2025
63
citations
#23

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Fei Shen, Hu Ye, Sibo Liu et al.

AAAI 2025
62
citations
#24

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024
62
citations
#25

BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks

Frederikke Marin, Felix Teufel, Marc Horlacher et al.

ICLR 2024
56
citations
#26

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time

Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin

AAAI 2024arXiv:2312.12343
data contaminationlanguage model evaluationreading comprehensiondynamic evaluation+4
53
citations
#27

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen, Can Rager, Johnny Lin et al.

ICML 2025
51
citations
#28

Energy-Based Diffusion Language Models for Text Generation

Minkai Xu, Tomas Geffner, Karsten Kreis et al.

ICLR 2025
48
citations
#29

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024
48
citations
#30

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers

Daiheng Gao, Shilin Lu, Wenbo Zhou et al.

ICML 2025
47
citations
#31

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.

ICLR 2025
faithfulness hallucinationretrieval-augmented generationcontextual evaluation benchmarkunanswerable context handling+3
45
citations
#32

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025arXiv:2409.13156
reward model trainingreward hacking mitigationcausal preference learningdata augmentation techniques+4
44
citations
#33

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025
44
citations
#34

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024arXiv:2312.10493
multimodal sarcasm detectioncontrastive learningout-of-distribution generalizationdebiasing methods+4
43
citations
#35

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.

CVPR 2024
42
citations
#36

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025arXiv:2412.02210
large multimodal modelsoptical character recognitionmultilingual text readingdocument parsing+4
42
citations
#37

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025
40
citations
#38

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024
39
citations
#39

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025
39
citations
#40

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Xin Li, Yunfei Wu, Xinghua Jiang et al.

CVPR 2024
37
citations
#41

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez

CVPR 2024
37
citations
#42

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025arXiv:2407.06057
language model alignmentpreference learningcontrolled text generationtext summarization+4
37
citations
#43

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

Mara Finkelstein, Markus Freitag

ICLR 2024
36
citations
#44

Amodal Completion via Progressive Mixed Context Diffusion

Katherine Xu, Lingzhi Zhang, Jianbo Shi

CVPR 2024
36
citations
#45

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.

CVPR 2024
35
citations
#46

Spurious Feature Diversification Improves Out-of-distribution Generalization

LIN Yong, Lu Tan, Yifan HAO et al.

ICLR 2024
33
citations
#47

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025
32
citations
#48

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025
32
citations
#49

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025
31
citations
#50

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024arXiv:2407.20337
contrastive learningdeepfake detectiondiffusion modelsglobal-local similarities+3
31
citations
#51

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Rui Liu, Yifan Hu, Yi Ren et al.

AAAI 2024arXiv:2312.11947
conversational speech synthesisemotional context modelingheterogeneous graph networkscontrastive learning+4
29
citations
#52

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025
29
citations
#53

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

Lin Sun, Kai Zhang, Qingyuan Li et al.

AAAI 2024arXiv:2401.03082
multimodal information extractioninstruction tuningunified modelgeneration problem+3
29
citations
#54

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025
28
citations
#55

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Changki Sung, Wanhee Kim, Jungho An et al.

CVPR 2024
28
citations
#56

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025
28
citations
#57

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.

ICLR 2025
27
citations
#58

Energy-guided Entropic Neural Optimal Transport

Petr Mokrov, Alexander Korotin, Alexander Kolesov et al.

ICLR 2024
27
citations
#59

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024
27
citations
#60

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025
27
citations
#61

Text-Based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning

Xinyi Wu, Wentao Ma, Dan Guo et al.

AAAI 2024
26
citations
#62

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Chaofeng Chen, Annan Wang, Haoning Wu et al.

ECCV 2024
26
citations
#63

UMBRAE: Unified Multimodal Brain Decoding

Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.

ECCV 2024
26
citations
#64

Improved baselines for vision-language pre-training

Jakob Verbeek, Enrico Fini, Michal Drozdzal et al.

ICLR 2024
26
citations
#65

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ZUYAN LIU, Benlin Liu, Jiahui Wang et al.

ECCV 2024
25
citations
#66

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-Supervised Skeleton-Based Action Recognition

Cong Wu, Xiao-Jun Wu, Josef Kittler et al.

AAAI 2024arXiv:2309.05834
skeleton-based action recognitioncontrastive learningspatiotemporal disentanglementmasked image modeling+4
24
citations
#67

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

ICLR 2025
24
citations
#68

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

Howard Yen, Tianyu Gao, Minmin Hou et al.

ICLR 2025
long-context language modelsbenchmark evaluationneedle-in-a-haystack tasksretrieval-augmented generation+3
23
citations
#69

Facial Affective Behavior Analysis with Instruction Tuning

Yifan Li, Anh Dao, Wentao Bao et al.

ECCV 2024
23
citations
#70

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025
23
citations
#71

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025
23
citations
#72

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, JINGYUAN YANG, Wei Peng

ICLR 2025
23
citations
#73

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Zhongwei Wan, Xinjian Wu, Yu Zhang et al.

ICLR 2025
kv cache optimizationattention score analysislong-context inferencegenerative inference efficiency+2
22
citations
#74

DIM: Dyadic Interaction Modeling for Social Behavior Generation

Minh Tran, Di Chang, Maksim Siniukov et al.

ECCV 2024
dyadic interaction modelingsocial behavior generation3d facial motioncontrastive learning+4
22
citations
#75

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024
22
citations
#76

A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation

Yongkang Wang, Xuan Liu, Feng Huang et al.

AAAI 2024arXiv:2312.15665
therapeutic peptide generationmulti-modal fusioncontrastive learningdiffusion models+3
22
citations
#77

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024arXiv:2304.10520
masked image modelingmasked autoencodersinstance discriminationcontrastive tuning+4
21
citations
#78

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Omer Moussa, Dietrich Klakow, Mariya Toneva

ICLR 2025
21
citations
#79

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

chengqian gao, Haonan Li, Liu Liu et al.

ICML 2025
20
citations
#80

ConR: Contrastive Regularizer for Deep Imbalanced Regression

Mahsa Keramati, Lili Meng, R. Evans

ICLR 2024
20
citations
#81

Generating and Reweighting Dense Contrastive Patterns for Unsupervised Anomaly Detection

Songmin Dai, Yifan Wu, Xiaoqiang Li et al.

AAAI 2024arXiv:2312.15911
unsupervised anomaly detectiondiffusion modelscontrastive pattern generationanomaly generation paradigm+4
20
citations
#82

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025
19
citations
#83

Customizing Language Model Responses with Contrastive In-Context Learning

Xiang Gao, Kamalika Das

AAAI 2024arXiv:2401.17390
contrastive learninglanguage model alignmentin-context learningintent customization+4
19
citations
#84

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu, Xuefeng Li, Pengfei Liu

ICLR 2025
18
citations
#85

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Qizhe Zhang, Mengzhen Liu, Lichen Li et al.

NeurIPS 2025
18
citations
#86

A New Mechanism for Eliminating Implicit Conflict in Graph Contrastive Learning

Dongxiao He, Jitao Zhao, Cuiying Huo et al.

AAAI 2024
18
citations
#87

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

Haoyu Zhao, Tianyi Lu, Jiaxi Gu et al.

ECCV 2024
18
citations
#88

Implicit Concept Removal of Diffusion Models

Zhili LIU, Kai Chen, Yifan Zhang et al.

ECCV 2024arXiv:2310.05873
text-to-image diffusionimplicit concept removalgeometric-driven controlnegative prompt optimization+3
18
citations
#89

A Dual-Way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

Shezheng Song, Shan Zhao, ChengYu Wang et al.

AAAI 2024arXiv:2312.11816
multimodal entity linkingneural text matchingcross-modal enhancementfine-grained image attributes+3
18
citations
#90

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Haiwen Diao, Xiaotong Li, Yufeng Cui et al.

ICCV 2025
18
citations
#91

Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts

Andong Tan, Fengtao Zhou, Hao Chen

ECCV 2024
17
citations
#92

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.

NeurIPS 2025
15
citations
#93

BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

Qianhan Feng, Lujing Xie, Shijie Fang et al.

AAAI 2024arXiv:2403.12986
semi-supervised learningclass imbalancecontrastive learningfeature-level regularization+4
15
citations
#94

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Changdae Oh, Yixuan Li, Kyungwoo Song et al.

ICLR 2025
15
citations
#95

RocketEval: Efficient automated LLM evaluation via grading checklist

Tianjun Wei, Wei Wen, Ruizhi Qiao et al.

ICLR 2025
15
citations
#96

ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY

Chenrui Tie, Yue Chen, Ruihai Wu et al.

ICLR 2025
15
citations
#97

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Julie Kallini, Shikhar Murty, Christopher Manning et al.

ICLR 2025
14
citations
#98

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025
14
citations
#99

Explore In-Context Segmentation via Latent Diffusion Models

Chaoyang Wang, Xiangtai Li, Henghui Ding et al.

AAAI 2025
14
citations
#100

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Jinluan Yang, Dingnan Jin, Anke Tang et al.

NeurIPS 2025arXiv:2502.06876
model merging3h optimizationlarge language model alignmentparameter-level conflict resolution+4
13
citations