"large language models" Papers
472 papers found • Page 5 of 10
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.
SWE-bench Goes Live!
Linghao Zhang, Shilin He, Chaoyun Zhang et al.
System Prompt Optimization with Meta-Learning
Yumin Choi, Jinheon Baek, Sung Ju Hwang
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
Jiacheng Xie, Yang Yu, Ziyang Zhang et al.
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du, Jinyi Han, Yizhou Ying et al.
Timely Clinical Diagnosis through Active Test Selection
Silas Ruhrberg Estévez, Nicolás Astorga, Mihaela van der Schaar
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
Training-Free Activation Sparsity in Large Language Models
James Liu, Pragaash Ponnusamy, Tianle Cai et al.
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Haizhou Shi, Yibin Wang, Ligong Han et al.
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
Yuwei Du, Jie Feng, Jie Zhao et al.
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
Kairui Yang, Zihao Guo, Gengjie Lin et al.
Tree of Preferences for Diversified Recommendation
Hanyang Yuan, Ning Tang, Tongya Zheng et al.
Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs
Yibo Wang, Hai-Long Sun, Guangda Huzhang et al.
Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Herun Wan, Jiaying Wu, Minnan Luo et al.
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
Xiang Meng, Mehdi Makni, Rahul Mazumder
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.
Uni-LoRA: One Vector is All You Need
Kaiyang Li, Shaobo Han, Qing Su et al.
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu, Ruirui Li, Yunzhe Qi et al.
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang, caigao jiang, Zhaoyi Li et al.
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
Wenlong Li, Yifei Xu, Yuan Rao et al.
Valid Inference with Imperfect Synthetic Data
Yewon Byun, Shantanu Gupta, Zachary Lipton et al.
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu et al.
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Ong et al.
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
Pulkit Gopalani, Wei Hu
Why Does the Effective Context Length of LLMs Fall Short?
Chenxin An, Jun Zhang, Ming Zhong et al.
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
WritingBench: A Comprehensive Benchmark for Generative Writing
Yuning Wu, Jiahao Mei, Ming Yan et al.
$S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
Zijie Pan, Yushan Jiang, Sahil Garg et al.
Accelerated Speculative Sampling Based on Tree Monte Carlo
Zhengmian Hu, Heng Huang
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin, Xudong Ma, Xingyu Zheng et al.
A Closer Look at the Limitations of Instruction Tuning
Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
Active Preference Learning for Large Language Models
William Muldrew, Peter Hayes, Mingtian Zhang et al.
Adaptive Text Watermark for Large Language Models
Yepeng Liu, Yuheng Bu
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Nicholas Crispino, Kyle Montgomery, Fankun Zeng et al.
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar et al.
AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training
Ziyu Wan, Xidong Feng, Muning Wen et al.
A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?
Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta et al.
Assessing Large Language Models on Climate Information
Jannis Bulian, Mike Schäfer, Afra Amini et al.
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
Yuan Chen, Zi-han Ding, Ziqin Wang et al.
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.
Autoformalizing Euclidean Geometry
Logan Murphy, Kaiyu Yang, Jialiang Sun et al.