mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-19 19:04:06 -06:00
Add image
This commit is contained in:
parent
238c471869
commit
0756773de7
BIN
content/assets/selfplay.png
Normal file
BIN
content/assets/selfplay.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.6 MiB |
@ -8,8 +8,14 @@ tags:
|
||||
- reinforcement
|
||||
- learning
|
||||
---
|
||||
|
||||
![[selfplay.png]]
|
||||
*Source: [Liu, Guertler et al., 2025](https://arxiv.org/abs/2506.24119)*.
|
||||
|
||||
## TL;DR
|
||||
*We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition.*
|
||||
_We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition._
|
||||
|
||||
---
|
||||
## Teaching Social Cognition Through Games
|
||||
At Plastic Labs, one of our key research interests is improving language models' social cognition: their ability to represent people's mental states, predict users' behaviors, and navigate complex social dynamics. This capability is essential for creating AI systems that can genuinely understand and adapt to individual users, yet it remains underdeveloped compared to technical abilities and so-called "hard skills" like reasoning and coding.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user