quartz/content/notes/LLMs excel at theory of mind because they read.md

28 lines
3.0 KiB
Markdown

---
title: LLMs excel at theory of mind because they read
date: 02.20.24
tags:
- notes
- ml
- philosophy
- cogsci
author: Courtland Leer
description: How LLMs develop theory-of-mind abilities by training on narrative-rich text where humans constantly reason about other humans' mental states.
---
Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function.
In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling.
That's no surprise; we're a social species with robust social cognition. It's also no surprise[^1] that grokking that interpersonal narrative space in its entirety would make LLMs adept at [[Loose theory of mind imputations are superior to verbatim response predictions|generation resembling social cognition too]].[^2]
We know that in humans, we can strongly [correlate reading with improved theory of mind abilities](https://journal.psych.ac.cn/xlkxjz/EN/10.3724/SP.J.1042.2022.00065). When your neural network is consistently exposed to content about how other people think, feel, desire, believe, prefer, those mental tasks are reinforced. The more experience you have with a set of ideas or states, the more adept you become.
The experience of such natural language narration _is itself a simulation_ where you practice and hone your theory of mind abilities. Even if, say, your English or Psychology teacher was foisting the text on you with other training intentions. Or even if you ran the simulation without coercion to escape at the beach.
It's not such a stretch to imagine that in optimizing for other tasks LLMs acquire emergent abilities not intentionally trained.[^3] It may even be that in order to learn natural language prediction, these systems need theory of mind abilities or that learning language specifically involves them--that's certainly the case with human wetware systems and theory of mind skills do seem to improve with model size and language generation efficacy.
[^1]: Kosinski includes a compelling treatment of much of this in ["Evaluating Large Language Models in Theory of Mind Tasks"](https://arxiv.org/abs/2302.02083)
[^2]: It also leads to other wacky phenomena like the [Waluigi effect](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post#The_Waluigi_Effect)
[^3]: Here's Chalmers [making a very similar point](https://youtube.com/clip/UgkxliSZFnnZHvYf2WHM4o1DN_v4kW6LsiOU?feature=shared)