View a PDF of the paper titled The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”, by Lucas Berglund and 6 other authors
View PDF HTML (experimental)
abstract:We uncover a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the opposite direction “B is A”. This is a curse in reverse. For example, if a model is trained on “Valentina Tereshkova was the first woman to travel into space”, it will not be able to automatically answer the question, “Who was the first woman to travel into space?”. Moreover, the probability of the correct answer (“Valentina Tershkova”) will be no higher than that of a random name. Thus, models do not generalize to the prevalent patterns in their training set: if “A is B” occurs, “B is A” is more likely to occur. However, it is worth noting that if “A is B” appears in context, models may predict the opposite relationship. We provide evidence for the reversal curse by refining GPT-3 and Llama-1 on hypothetical statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they are effective on hypothetical statements such as “Who composed Abyssal Melodies?” Fail to give the correct answer. The reversal curse is robust across model sizes and model families and does not diminish with data growth. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and vice versa for “Who is Mary Lee Pfeiffer’s son?”. GPT-4 answers questions like the former correctly 79% of the time, compared to 33% for the latter.
The code is available here: This https URL.
Submission History
From: Owen Evans [view email]
[v1]
Thu, 21 Sep 2023 17:52:19 UTC (1,320 KB)
[v2]
Fri, 22 Sep 2023 18:08:20 UTC (1,319 KB)
[v3]
Thu, 4 Apr 2024 21:25:17 UTC (1,336 KB)
[v4]
Sunday, 26 May 2024 17:45:21 UTC (1,336 KB)
<a href