Add image

2026-03-22 05:55:42 -05:00 · 2025-08-26 16:43:02 -04:00 · 2025-08-26 16:43:02 -04:00 · 0756773de7
commit 0756773de7
parent 238c471869
2 changed files with 7 additions and 1 deletions
--- a/content/assets/selfplay.png
+++ b/content/assets/selfplay.png
--- a/content/research/SPIRAL
+++ b/content/research/SPIRAL
@ -8,8 +8,14 @@ tags:
  - reinforcement
  - learning
 ---
+
+![[selfplay.png]]
+*Source: [Liu, Guertler et al., 2025](https://arxiv.org/abs/2506.24119)*.
+
 ## TL;DR
-*We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition.*
+_We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition._
+
+---
 ## Teaching Social Cognition Through Games
 At Plastic Labs, one of our key research interests is improving language models' social cognition: their ability to represent people's mental states, predict users' behaviors, and navigate complex social dynamics. This capability is essential for creating AI systems that can genuinely understand and adapt to individual users, yet it remains underdeveloped compared to technical abilities and so-called "hard skills" like reasoning and coding.