&quot;large language models&quot; Papers

NeurIPS 2025posterarXiv:2504.16427

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

Hanlei Zhang, zhuohang li, Hua Xu et al.

NeurIPS 2025posterarXiv:2503.05493

Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation

Qijiong Liu, Jieming Zhu, Lu Fan et al.

NeurIPS 2025posterarXiv:2509.17552

Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning

Tianle Zhang, Wanlong Fang, Jonathan Woo et al.

ICLR 2025posterarXiv:2403.06833

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.

ICLR 2025posterarXiv:2405.14804

Can LLMs Solve Longer Math Word Problems Better?

Xin Xu, Tong Xiao, Zitong Chao et al.

ICLR 2025posterarXiv:2410.05440

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

NeurIPS 2025spotlightarXiv:2506.10364

Can We Infer Confidential Properties of Training Data from LLMs?

Pengrun Huang, Chhavi Yadav, Kamalika Chaudhuri et al.

NeurIPS 2025posterarXiv:2506.15538

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Laura Kopf, Nils Feldhus, Kirill Bykov et al.

ICLR 2025posterarXiv:2410.16454

Catastrophic Failure of LLM Unlearning via Quantization

Zhiwei Zhang, Fali Wang, Xiaomin Li et al.

Causally Motivated Sycophancy Mitigation for Large Language Models

Haoxi Li, Xueyang Tang, Jie ZHANG et al.

ICLR 2025posterarXiv:2305.00050

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

390

ICLR 2025posterarXiv:2407.02408

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Song Wang, Peng Wang, Tong Zhou et al.

NeurIPS 2025posterarXiv:2505.07865

CellVerse: Do Large Language Models Really Understand Cell Biology?

Fan Zhang, Tianyu Liu, Zhihong Zhu et al.

ICLR 2025posterarXiv:2405.18780

Certifying Counterfactual Bias in LLMs

Isha Chaudhary, Qian Hu, Manoj Kumar et al.

ICLR 2025posterarXiv:2410.01943

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.

116

ICLR 2025posterarXiv:2407.14482

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Peng Xu, Wei Ping, Xianchao Wu et al.

ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning

Xiangru Tang, Tianyu Hu, Muyang Ye et al.

CIDD: Collaborative Intelligence for Structure-Based Drug Design Empowered by LLMs

Bowen Gao, Yanwen Huang, Yiqiao Liu et al.

NeurIPS 2025posterarXiv:2503.18809

Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Augusto B. Corrêa, André G. Pereira, Jendrik Seipp

CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

Keuntae Kim, Eunhye Jeong, Sehyeon Lee et al.

ICLR 2025posterarXiv:2410.16701

ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky et al.

ClinBench: A Standardized Multi-Domain Framework for Evaluating Large Language Models in Clinical Information Extraction

Ismael Villanueva Miranda, Zifan Gu, Donghan Yang et al.

ICLR 2025posterarXiv:2402.11924

CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark

Jian Wu, Linyi Yang, Zhen Wang et al.

Competing Large Language Models in Multi-Agent Gaming Environments

Jen-Tse Huang, Eric John Li, Man Ho LAM et al.

Computation and Memory-Efficient Model Compression with Gradient Reweighting

Zhiwei Li, Yuesen Liao, Binrui Wu et al.

NeurIPS 2025posterarXiv:2505.11576

Concept-Guided Interpretability via Neural Chunking

Shuchen Wu, Stephan Alaniz, Shyamgopal Karthik et al.

Concept Incongruence: An Exploration of Time and Death in Role Playing

Xiaoyan Bai, Ike Peng, Aditya Singh et al.

NeurIPS 2025oralarXiv:2505.14905

NeurIPS 2025spotlightarXiv:2510.04564

Conditional Representation Learning for Customized Tasks

Honglin Liu, Chao Sun, Peng Hu et al.

Confidence Elicitation: A New Attack Vector for Large Language Models

Brian Formento, Chuan Sheng Foo, See-Kiong Ng

ICLR 2025posterarXiv:2502.04643

NeurIPS 2025posterarXiv:2508.18847

ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Yibo Li, Miao Xiong, Jiaying Wu et al.

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance

Sachin Goyal, Christina Baek, Zico Kolter et al.

NeurIPS 2025spotlightarXiv:2506.10707

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Marco Spinaci, Marek Polewczyk, Maximilian Schambach et al.

ICLR 2025posterarXiv:2502.21290

Contextualizing biological perturbation experiments through language

Menghua (Rachel) Wu, Russell Littman, Jacob Levine et al.

NeurIPS 2025posterarXiv:2506.00781

CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho

NeurIPS 2025spotlightarXiv:2505.15101

Cost-aware LLM-based Online Dataset Annotation

Eray Can Elumar, Cem Tekin, Osman Yagan

NeurIPS 2025posterarXiv:2505.10844

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Sophia Han, Howard Dai, Stephen Xia et al.

ICLR 2025posterarXiv:2406.08587

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong et al.

CVPR 2025posterarXiv:2410.00379

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

NeurIPS 2025oralarXiv:2505.18411

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

Yue Jiang, Jichu Li, Yang Liu et al.

ICLR 2025posterarXiv:2406.18966

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Yue Huang, Siyuan Wu, Chujie Gao et al.

ICLR 2025posterarXiv:2502.19363

DataMan: Data Manager for Pre-training Large Language Models

Ru Peng, Kexin Yang, Yawen Zeng et al.

DataSIR: A Benchmark Dataset for Sensitive Information Recognition

Fan Mo, Bo Liu, Yuan Fan et al.

ICLR 2025posterarXiv:2505.04965

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

NeurIPS 2025posterarXiv:2510.01243

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

Yisong Xiao, Aishan Liu, Siyuan Liang et al.

CVPR 2025posterarXiv:2411.18562

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang et al.