Fairness in ML
Fair and unbiased models
Related Topics (Ethics & Safety)
Top Papers
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
Tianle Li, Wei-Lin Chiang, Evan Frick et al.
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan Wang, Sijie Cheng, Xianyuan Zhan et al.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Chongyu Fan, Jiancheng Liu, Yihua Zhang et al.
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen, Chao Du, Tianyu Pang et al.
Human Feedback is not Gold Standard
Tom Hosking, Phil Blunsom, Max Bartolo
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
Zhiyuan You, Xin Cai, Jinjin Gu et al.
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D', Incà, Elia Peruzzo et al.
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.
Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
Micah Goldblum, Marc Finzi, Keefer Rowan et al.
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij, Felix Hofstätter, Oliver Jaffe et al.
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen, Zichen Wen, Yichao Du et al.
A Decade's Battle on Dataset Bias: Are We There Yet?
Zhuang Liu, Kaiming He
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard, Avinash Madasu, Tiep Le et al.
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Jaehun Jung, Faeze Brahman, Yejin Choi
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang, Haoning Wu, Chunyi Li et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
Spurious Feature Diversification Improves Out-of-distribution Generalization
LIN Yong, Lu Tan, Yifan HAO et al.
FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization
Cheng Yang, Jixi Liu, Yunhe Yan et al.
Training Unbiased Diffusion Models From Biased Dataset
Yeongmin Kim, Byeonghu Na, Minsang Park et al.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Yu Tian, Congcong Wen, Min Shi et al.
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu, Jialu Wang, Hao Cheng et al.
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou, Haote Yang, Dairong Chen et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Wenhui Zhu, Xiwen Chen, Peijie Qiu et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li, Anh Dao, Wentao Bao et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani et al.
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael Curry, Sven Seuken
Diverse Preference Learning for Capabilities and Alignment
Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell
Pathologies of Predictive Diversity in Deep Ensembles
Geoff Pleiss, Taiga Abe, E. Kelly Buchanan et al.
Debiasing Algorithm through Model Adaptation
Tomasz Limisiewicz, David Mareček, Tomáš Musil
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
chengqian gao, Haonan Li, Liu Liu et al.
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu, Laura Ruis, Tim Rocktäschel et al.
Weighted Envy-Freeness for Submodular Valuations
Luisa Montanari, Ulrike Schmidt-Kraepelin, Warut Suksompong et al.
First-Person Fairness in Chatbots
Tyna Eloundou, Alex Beutel, David Robinson et al.
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao, Ran Cheng, Xingyu Wu et al.
Repeated Fair Allocation of Indivisible Items
Ayumi Igarashi, Martin Lackner, Oliviero Nardi et al.
Unprocessing Seven Years of Algorithmic Fairness
André F. Cruz, Moritz Hardt
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park, Hyeran Byun
Project-Fair and Truthful Mechanisms for Budget Aggregation
Rupert Freeman, Ulrike Schmidt-Kraepelin
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
Model Equality Testing: Which Model is this API Serving?
Irena Gao, Percy Liang, Carlos Guestrin
RocketEval: Efficient automated LLM evaluation via grading checklist
Tianjun Wei, Wei Wen, Ruizhi Qiao et al.
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
Haoyang Liu, Jie Wang, Zijie Geng et al.
Position: Editing Large Language Models Poses Serious Safety Risks
Paul Youssef, Zhixue Zhao, Daniel Braun et al.
BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning
Qianhan Feng, Lujing Xie, Shijie Fang et al.
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung, Seungju Han, Ximing Lu et al.
MFABA: A More Faithful and Accelerated Boundary-Based Attribution Method for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Jiayu Zhang et al.
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du, Yiming Yang, Sean Welleck
Regroup Median Loss for Combating Label Noise
Authors: Fengpeng Li, Kemou Li, Jinyu Tian et al.
Towards Fair Graph Federated Learning via Incentive Mechanisms
12794 Chenglu Pan, Jiarong Xu, Yue Yu et al.
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Jinluan Yang, Dingnan Jin, Anke Tang et al.
How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence
Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei et al.
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma, Jiongxiao Wang, Fei Wang et al.
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Song Wang, Peng Wang, Tong Zhou et al.
Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Xuankun Rong, Wenke Huang, Jian Liang et al.
Weighted-Reward Preference Optimization for Implicit Model Fusion
Ziyi Yang, Fanqi Wan, Longguang Zhong et al.
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.
Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
Yuefan Cao, Xiaoyu Li, Yingyu Liang et al.
Imputation for prediction: beware of diminishing returns.
Marine Le Morvan, Gael Varoquaux
Minimum-Norm Interpolation Under Covariate Shift
Neil Mallinar, Austin Zane, Spencer Frei et al.
CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
Hyuck Lee, Heeyoung Kim
Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel et al.
Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation
Hsiang Hsu, Guihong Li, Shaohan Hu et al.
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
Zhaobo Qi, Yibo Yuan, Xiaowen Ruan et al.
Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework
Maresa Schröder, Dennis Frauen, Stefan Feuerriegel
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
Thomas Zollo, Todd Morrill, Zhun Deng et al.
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
Reliable and Efficient Amortized Model-based Evaluation
Sang Truong, Yuheng Tu, Percy Liang et al.
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang, Ru Zhen, Jianing Wang et al.
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu et al.
Post-hoc bias scoring is optimal for fair classification
Wenlong Chen, Yegor Klochkov, Yang Liu
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
Constrained Fair and Efficient Allocations
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
PowerMLP: An Efficient Version of KAN
Ruichen Qiu, Yibo Miao, Shiwen Wang et al.
Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
Peiyan Zhang, Haoyang Liu, Chaozhuo Li et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Hadi Hosseini, Samarth Khanna