Merge pull request #119 from plastic-labs/chl/renovation_12.25

first pass blog reno
This commit is contained in:
Courtland Leer 2025-12-08 21:17:00 -05:00 committed by GitHub
commit 9d3d2e2ece
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
79 changed files with 697 additions and 2336 deletions

View File

@ -1,39 +1,40 @@
---
title: Home
enableToc: false
description: Welcome to our collaborative second brain.
description: Welcome to Plastic Labs' blog.
---
> [!custom] PLASTIC IS HIRING!
> [[Working at Plastic|Open positions here]].
Welcome.
Here you'll find our blog, research, and public notes. You can also [engage with the ideas directly](https://github.com/plastic-labs/blog).
[Plastic](https://plasticlabs.ai) is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
Our focus is developing systems that map personal identity using AI-native memory & social cognition. These systems enable individually-aligned agents you can trust to act autonomously and agents with rich identities all their own.
The foundational layer of intelligence being built is just the beginning. Latent among the scores of specialized secondary and tertiary layers yet to be realized exists one for personal identity.
We're building it.
> [!custom] WELCOME TO [PLASTIC LABS](https://plasticlabs.ai)
>
> Here you'll find our blog, research, and public notes. You can also [engage with the ideas directly](https://github.com/plastic-labs/blog).
>
> [Plastic](https://plasticlabs.ai) is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
>
> Our focus is developing [Honcho](https://honcho.dev/), an AI-native memory solution powered by our state-of-the-art [reasoning models](https://plasticlabs.ai/neuromancer). Honcho is a continual learning system for modeling personal identity, and soon a shared context layer for individual alignment.
>
> The foundational layer of intelligence being built is just the beginning. Latent among the scores of specialized secondary and tertiary layers yet to be realized exists one for personal identity.
>
> We're building it.
# Guide
We post a few different types of content here:
- [[blog | Blog]] -- Deep dives into the philosophy, cogsci, ML, & development underpinning our projects
- [[careers | Careers]] -- Open positions at Plastic
- [[notes | Evergreen Notes]] -- Short form working notes on Plastic theses
- [[extrusions | Extrusions]] -- Brief, densely-linked reflections synthesizing recent conceptual work
- [[releases | Release Notes]] -- Changelogs & details on new product features
- [[research | Research]] -- Formal published, preprint, or blog-style research we've made public
- [[blog|Blog]] - Deep dives into the cogsci, development, & ML underpinning our projects
- [[research|Research]] - Preprint or blog-style research we've made public
- [[notes|Notes]] - Short-form working notes on Plastic theses
- [[archive|Archive]] - Legacy content about out-of-date or depreciated projects & features
- [[careers|Careers]] - Open positions at Plastic
[*Subscribe to Updates*](https://plasticlabs.typeform.com/mailing)
[*Subscribe to updates*](https://plasticlabs.typeform.com/mailing).
# Projects
If you find the content here compelling, explore our active projects:
Explore our active projects:
- [Honcho](https://honcho.dev) -- AI-native memory, reasoning, & socialcog for apps & agents ( #honcho)
- [Neuromancer](https://plasticlabs.ai/neuromancer) -- Reasoning models for memory & personal identity ( #neuromancer)
- [YouSim](https://yousim.ai) -- Honcho-powered identity simulator ( #yousim)
- [Penny for Your Thoughts](https://www.pennyforyourthoughts.ai/) -- Honcho/x402-powered personal expertise market ( #penny)
- [Bloom](https://bloombot.ai) -- Honcho-powered learning companion ( #bloom)
- [Xeno Grant](https://x.com/xenograntai) -- Direct to agent grants program ( #grants)
**PRODUCTS**
- [Honcho](https://honcho.dev) - AI-native memory & reasoning infra for apps & agents ( #honcho)
- [Neuromancer](https://plasticlabs.ai/neuromancer) - Reasoning models for memory & personal identity ( #neuromancer)
**DEMOS**
- [Honcho Chat](https://honcho.chat) - Honcho-powered AI-assistant platform with SOTA memory ( #chat)
- [Penny for Your Thoughts](https://www.pennyforyourthoughts.ai/) - Honcho/x402-powered personal expertise market ( #penny)
- [YouSim](https://yousim.ai) - Honcho-powered identity simulator ( #yousim)
**COMMUNITY**
- [Xeno Grant](https://x.com/xenograntai) - Direct-to-agent grants program ( #grants)

View File

@ -1,27 +1,31 @@
---
title: "Comprehensive Analysis of Design Patterns for REST API SDKs"
date: 05.09.2024
tags: ["blog", "dev"]
author: "Vineeth Voruganti"
title: "ARCHIVED: A Comprehensive Analysis of Design Patterns for REST API SDKs"
date: 05.09.24
tags:
- blog
- dev
- archive
author: Vineeth Voruganti
description: A deep dive into SDK design patterns, comparing object-oriented vs singleton approaches & evaluating code generation platforms for API client libraries.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post contains Vineeth's (Plastic's Co-founder & CTO) notes on REST API SDK design patterns that informed how we built Honcho's client libraries. Some patterns described here have been superseded by our shift toward LLM-native interfaces, but the analysis of pagination, error handling, & developer experience remains useful for anyone building API tooling.
>
> For the most up-to-date SDK reference, check out the [Honcho Docs](https://docs.honcho.dev).
>
> Enjoy.
This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)
and written by [Vineeth Voruganti](https://github.com/VVoruganti)
## TL;DR
*This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)*
# TL;DR
After several months of managing the SDKs for Honcho manually, we decided to
take a look at the options available for automatically generating SDKs.
From our research we picked a platform and have made brand new SDKs for Honcho
From our research we picked a platform and have made brand-new SDKs for Honcho
that use idiomatic code, are well documented, and let us support more languages.
---
For the past few months I have been working on managing the
[Honcho](https://honcho.dev) project and its associated SDKs. We've been taking
the approach of developing the SDK manually as we are focused on trying to find
the best developer UX and maximize developer delight.
# Introduction
For the past few months I have been working on managing the [Honcho](https://honcho.dev) project and its associated SDKs. We've been taking the approach of developing the SDK manually as we are focused on trying to find the best developer UX and maximize developer delight.
This has led to a rather arduous effort that has required a large amount of
refactoring as we are making new additions to the project, and the capabilities
@ -30,20 +34,15 @@ of the platform rapidly expand.
While these efforts have been going on a new player in the SDK generation space
dropped on [hacker news](https://news.ycombinator.com/item?id=40146505).
When I first started working on **Honcho** I did a cursory look at a number of SDK
When I first started working on Honcho I did a cursory look at a number of SDK
generators, but wasn't impressed with the results I saw. However, a lot of that
was speculative and Honcho was not nearly as mature as it is now.
So spurred by the positive comments in the thread above I've decided to do a
more detailed look into the space and, also try to develop a better understanding
of what approaches are generally favorable in creating API client libraries.
## Background
For a full understanding of Honcho I recommend the great [[A Simple Honcho
Primer|Simple Honcho
Primer]] post, but I'll
try to summarize the important details here.
# Background
For a full understanding of Honcho I recommend the great [[ARCHIVED; A Simple Honcho Primer|Simple Honcho Primer]] post, but I'll try to summarize the important details here.
Honcho is a personalization platform for LLM applications. It is infrastructure
that developers can use for storing data related to their applications, deriving
@ -82,9 +81,7 @@ session = user.create_session()
There is an Async version of the SDK with an `AsyncHoncho` class that uses
objects such as `AsyncSession` and `AsyncUser`.
## Guiding Questions
# Guiding Questions
Before evaluating the below platforms I wanted to investigate a few questions I
had about how to design SDKs and how they are generally maintained in other
organizations. I've also included some questions I want to think about when
@ -107,9 +104,7 @@ Platform Specific Questions
3. How easy was it to use the tool?
4. What approach does the tool take? Object-oriented or singleton?
5. How does it handle async vs sync interfaces?
## Research
# Research
> First I took a look at sources and posts onlines that talk in general about
> developing SDKs. This isn't an exhaustive look at every link I looked at, but
> ones I thought were relevant. The notes are messy and not necessarily fully
@ -173,8 +168,7 @@ the end.
At the time of this research there was no follow-up post.
[Ask HN: Best practices (and examples) for designing client libraries for
APIs?](https://news.ycombinator.com/item?id=23283551)
[Ask HN: Best practices (and examples) for designing client libraries for APIs?](https://news.ycombinator.com/item?id=23283551)
The first comment actually advocates for an object-oriented model but just using
the top level client object for authentication and setup stuff.
@ -298,16 +292,13 @@ Some key insights
- Have modular design patterns that make it easy to extend and pick and choose
features.
[Should I implement OOP in a REST
API?](https://www.reddit.com/r/flask/comments/1755ob0/should_i_implement_oop_in_a_rest_api/)
[Should I implement OOP in a REST API?](https://www.reddit.com/r/flask/comments/1755ob0/should_i_implement_oop_in_a_rest_api/)
Most people seem to be saying a full OOP method is overkill, but there are
people advocating for having a controller class with methods that take data
objects as inputs. Essentially advocating for the singleton approach with data
only objects.
### Analysis
## Analysis
Many of the generic concerns of SDK design do not have to do with the UX of the
SDK for the end developer, rather background processes that an SDK handle. This
includes:
@ -339,18 +330,12 @@ but the object-oriented approach may not be a readable, and it could be unclear
what methods are doing in complex codebases. Even GPT-4 couldn't decide between
the two.
![Asking GPT-4 about Singleton vs Object-Oriented
Approaches](/assets/sdk-gpt-4.png)
![Asking GPT-4 about Singleton vs Object-Oriented Approaches](/assets/sdk-gpt-4.png)
Again and again, the best way to approach SDK development is to just do whatever
is easier, and create tons of documentation that will help developers navigate
your [API Ladder](https://blog.sbensu.com/posts/apis-as-ladders/). Someone will
get confused regardless of what you do, so the key is to make sure the SDK makes
sense (even if it's not the most efficient or clean) and remove hurdles for
users to navigate errors and mistakes.
## SDK Generation Platforms
your [API Ladder](https://blog.sbensu.com/posts/apis-as-ladders/). Someone will get confused regardless of what you do, so the key is to make sure the SDK makes sense (even if it's not the most efficient or clean) and remove hurdles for users to navigate errors and mistakes.
# SDK Generation Platforms
With a sense of the best standards for SDK design and additional features that
should be supported in the SDK I want to look at a few different options to
determine what is the best solution to go with.
@ -364,9 +349,7 @@ Below is a list of the different platforms I wanted to review
I was using the OpenAPI Spec for Honcho that was housed at
https://demo.honcho.dev/openapi.json.
### Stainless
## Stainless
Since the hacker news thread for the release of stainless is what spurred this
research I decided to try them out first.
@ -381,9 +364,7 @@ of the interface. There was also built-in capabilities for retries, pagination,
and auth.
There's also capability for adding custom code such as utility functions.
### Speakeasy
## Speakeasy
Speakeasy required me to do everything locally through their `brew` package. It
did not immediately accept the OpenAPI Spec and required me to make some tweaks.
These were low-hanging fruit, and their cli has a handy AI tool that will
@ -397,9 +378,7 @@ The generated SDK didn't feel as strong as the stainless one. There didn't seem
to support `async` methods, it did not use `pydantic` and used the built-in
Python `@dataclass`. The methods had really unwieldy names, and looked like it
would need a lot of tweaking to get it more production ready.
### Liblab
## Liblab
Also had me do the generation from the cli using their npm package. It was
pretty straightforward to login and give it an API spec. Liblab seems to require
a lot tweaking to get better results. It gave me several warnings asking me to
@ -414,8 +393,7 @@ which seems to be the industry standard for codegen tools. The method names
were also unwieldy. It also didn't make use of pydantic and instead implemented
its own `BaseModel` class. It was built on the `requests` model and doesn't seem
to support `async` methods.
### OpenAPI Generator
## OpenAPI Generator
This is the only one on the list that is not expressly backed by a company
whose main goal is SDK generation. It is however a very popular project with
@ -435,9 +413,7 @@ Once again, the sdk use the `singleton` approach.
I also did not see any indication of functionality for retry logic,
authentication, or pagination.
### Conclusion
## Conclusion
Overall, Stainless had the results that I liked the most. With almost no work
from me, it produced a high quality SDK that designed things in a sensible way
with many built-in features such as retries, pagination, and auth.
@ -459,9 +435,7 @@ What I'm looking for right now is the platform or tool that can reduce my work
the most and let me focus on other things and stainless achieved that. The
results are not perfect, but it doesn't look like it'll need more than some
slight tweaking and testing to get to a state I want.
## Results
# Results
After reaching the conclusion in the previous section, I took some time to fully
implement Stainless to make SDKs for Honcho and am proud to announce the release
of a new Python SDK, and the launch of a brand-new NodeJS SDK.

View File

@ -1,5 +1,5 @@
---
title: "Honcho: User Context Management for LLM Apps"
title: "ARCHIVED: Honcho: User Context Management for LLM Apps"
enableToc: true
date: 01.18.24
tags:
@ -8,34 +8,45 @@ tags:
- philosophy
- ml
- announcements
- archive
author: Courtland Leer & Vince Trost
description: Introducing Honcho, an open-source user context management framework for LLM applications that enables personalized, user-first AI experiences at scale.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This is the [Honcho](https://honcho.dev) origin story--our first public announcement of the project.
>
> We first pitched it as "an open-source version of the OpenAI Assistants API" for managing AI app data on a per-user basis. The architecture described here has evolved into Honcho's current "[[Beyond the User-Assistant Paradigm; Introducing Peers|peer paradigm]]," which unifies users & AI agents as Peers & supports much more sophisticated memory, continual learning, & [[Memory as Reasoning|powerful reasoning]].
>
> But this post also captures Honcho's founding vision: that the "missing piece of the stack" was user context, that LLMs are uniquely suited to get to know users in ways traditional software couldn't, & that personalization would be table stakes for AI apps.
>
> If you want to understand where Honcho came from & why we built it, start here.
>
> Enjoy.
![[missing_piece.png]]
*The missing piece of the stack*
## TL;DR
Today we drop the first release of a project called [*Honcho*](https://github.com/plastic-labs/honcho/tree/main), an open-source version of the OpenAI Assistants API. Honcho manages your AI app data on a per-user basis, allowing for multiple concurrent sessions. Glaringly absent from the existing stack, Honcho will, at full maturity, usher the advent of atomic, disposable agents that are user-first by default.
## Plastic Lore
# TL;DR
*Today we drop the first release of a project called [Honcho](https://github.com/plastic-labs/honcho/tree/main), an open-source version of the OpenAI Assistants API. Honcho manages your AI app data on a per-user basis, allowing for multiple concurrent sessions. Glaringly absent from the existing stack, Honcho will, at full maturity, usher the advent of atomic, disposable agents that are user-first by default.*
# Plastic Lore
[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. ^09f185
As a team with with backgrounds in both machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and still are 1-to-many instructors. Yes, they herald the beginning of a revolution in personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short.
![[zombie_tutor_prompt.jpg]]
*A well intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).* ^dfae31
*A well-intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).* ^dfae31
Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities.
Most EdTech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing EdTech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities.
So we set out to build a non-skeuomorphic, AI-native tutor that put users first. The same indeterminism so often viewed as LLMs' greatest liability is in fact their greatest strength. Really, it's what they _are_. When great teachers deliver effective personalized instruction, they don't consult some M.Ed flowchart, they leverage the internal personal context they have on the student and reason (consciously or basally) about the best pedagogical intervention. LLMs are the beginning of this kind of high-touch learning companion being _synthetically_ possible.
![[teacher_shoggoth.png]]
*We're not so different after all ([@anthrupad](https://twitter.com/anthrupad)).*
Our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
## Context Failure Mode
But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student.
Our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
# Context Failure Mode
But we quickly ran up against a hard limitation. The failure mode we believe all vertical-specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student.
We're consistently blown away by how many people don't realize large language models themselves are stateless. They don't remember shit about you. They're just translating context they're given into probable sequences of tokens. LLMs are like horoscope writers, good at crafting general statements that *feel* very personal. You would be too, if you'd ingested and compressed that much of the written human corpus.
@ -53,9 +64,7 @@ The real magic of 1:1 instruction isn't subject matter expertise. Bloom and the
Large language models can be good at this too. With similar compression and generation abilities, they're uniquely suited (among existing technology) to get to know you. We really can have shared culture and relationships with LLMs, absent (if we like) any cringy anthropomorphism.
Bloom needed a mechanism to harvest and utilize more context about the student. So we built it one.
## Research Solutions
# Research Solutions
Prediction algorithms have become phenomenal at hacking attention using tabular engagement and activity data. But if we're thinking LLM-natively, a few questions emerge:
1. How are LLMs uniquely positioned to understand users?
@ -75,21 +84,16 @@ Late last year we published a [research pre-print on this topic](https://arxiv.o
*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared), from our research.*
We added it to Bloom and found the missing piece to overcoming the failure mode of user context. Our tutor could now learn about the student and use that knowledge effectively to produce better learning outcomes.
## Blast Horizon
Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple longform sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential.
# Blast Horizon
Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple long-form sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential.
And we're hearing constantly from builders of other vertical specific AI apps that personalization is the key blocker. In order for projects to graduate form toys to tools, they need to create new kinds of magic for their users. Mountains of mostly static software exists to help accomplish an unfathomable range of tasks and lots of it can be personalized using traditional (albeit laborious for the user) methods. But LLMs can observe, reason, then generate the software _and the user context_, all abstracted away behind the scenes.
Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family, without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree.
This is the kind of future we can build when we put users at the center of our agent and LLM app production.
## Introducing Honcho
# Introducing Honcho
^a9d0f8
So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. ^8c982b
Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).
@ -98,12 +102,10 @@ Honcho is a REST API that defines a storage schema to seamlessly manage your app
We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably taking it down because of infrastructure/scaling issues.
Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speedrunning to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.
Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speed-running to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.
Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across an ecosystem of agents.
## Get Involved
# Get Involved
We're excited to see builders experiment with what we're releasing today, and with Honcho as it continues to evolve.
Check out the [GitHub repo](https://github.com/plastic-labs/honcho) to get started and join our [Discord](https://discord.gg/plasticlabs) to stay up to date 🫡.

View File

@ -1,26 +1,33 @@
---
title: Introducing Honcho's Dialectic API
title: "ARCHIVED: Introducing Honcho's Dialectic API"
date: 03.26.24
tags:
- dev
- ml
- announcements
- blog
- archive
author: Courtland Leer, Vince Trost, & Vineeth Voruganti
description: Announcing the Dialectic API--an LLM-native endpoint enabling agent-to-agent chat in natural language for dynamic user personalization.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post announced Honcho's Dialectic API--an LLM-native endpoint for just-in-time agent-to-agent context queries in natural language. This endpoint has since evolved into the much more powerful `.chat` method in Honcho today. The Dialectic API was ahead of its time, and its successor remains state-of-the-art.
>
> Here we lay out the reasoning behind the development of this feature. We get into the case for natural language as a substrate for agent coordination, the argument that rigid API specs constrain what's now possible, & a vision of agents collaboratively reasoning about how to personalize UX--all thinking that's shaped everything we've built since.
>
> Enjoy.
![[agent_dialectics.jpeg]]
## TL;DR
# TL;DR
*Our [Dialectic API](https://docs.honcho.dev/guides/dialectic-endpoint) is an LLM-native way for your AI application to discuss user context with Honcho. It allows for direct LLM-to-LLM communication in natural language.*
Our [Dialectic API](https://docs.honcho.dev/guides/dialectic-endpoint) is an LLM-native way for your AI application to discuss user context with Honcho. It allows for direct LLM-to-LLM communication in natural language.
Agents need ways to interface dynamically and autonomously, free from the rigidness of traditional APIs. We're building that substrate.
## What's a Dialectic API?
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
### How It Works
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[User State is State of the Art|each specific user]].
*Agents need ways to interface dynamically and autonomously, free from the rigidness of traditional APIs. We're building that substrate.*
# What's a Dialectic API?
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[ARCHIVED; Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
## How It Works
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[ARCHIVED; User State is State of the Art|each specific user]].
That's when your agent asks Honcho for what it needs in natural language. This query can take many forms. Some possibilities:
@ -36,23 +43,19 @@ That's when your agent asks Honcho for what it needs in natural language. This q
- A static fact about user identity
- A piece of user data to use in improving your app's overall vertical or user-specific service
Key to note here is the ability to hard code the most useful type of Honcho query for your app's use case *or*--better yet--to [[Extrusion 02.24|trust your agent]] to reason autonomously about what it needs based upon the current session (or any other criteria) and feed that to Honcho. Or run a hybrid approach. This can be done synchronously with an inference/session or async as needed.
Key to note here is the ability to hard code the most useful type of Honcho query for your app's use case *or*--better yet--to [[On intellectual respect|trust your agent]] to reason autonomously about what it needs based upon the current session (or any other criteria) and feed that to Honcho. Or run a hybrid approach. This can be done synchronously with an inference/session or async as needed.
In this way, Honcho becomes an self-improving oracle to the identity of each and every one of your app's users. Any agent can chat with a representation of a user (as Honcho) on the backend.
Honcho responds to queries in the same format--natural language. Most simply, this is just a conversation between two agents, *collaboratively* reasoning about the best way to personalize UX. Agent-to-agent chat over users.
In the coming weeks, we'll release a number of off the shelf options to plug into any cognitive architecture and demos to illustrate more custom utility. We expect to see (and are already seeing in [our private beta](https://plasticlabs.typeform.com/honchobeta)) lots of novel ways to prompt Honcho effectively.
### Why We Built It
In the coming weeks, we'll release a number of off-the-shelf options to plug into any cognitive architecture and demos to illustrate more custom utility. We expect to see (and are already seeing in [our private beta](https://plasticlabs.typeform.com/honchobeta)) many novel ways to prompt Honcho effectively.
## Why We Built It
Why is a dialectic API the right way to solve the problem of user context in LLM applications?
Not only is it ideal from a development and design perspective, it's optimal for the particular task of personal context and user identity.
#### The DevEx Case
### The DevEx Case
^a14c2f
Our Dialectic API is single endpoint for everything personalization.
It reduces development overhead and allows you to get a personalized application running quickly and efficiently--speedrunning to production.
@ -62,32 +65,24 @@ For most AI apps, personalization will be a key differentiator between your agen
Further, when agents can communicate directly using natural language, there's no need to learn and manage complicated API specification. Or for us to build it. Since LLMs are proficient at interpreting the intricacies of natural language, there's a functionally infinite number of ways to ask Honcho a question and get a satisfactory result. Far superior to brittle and strict legacy APIs.
However, this doesn't mean the developer now needs to be a prompting expert, fluent in all its esoterica. Honcho is an expert in personal context and theory of mind reasoning, so your prompts can be adaptive and ad hoc, and Honcho will figure out the rest. When you're ready, you can even offload the queries to your app-side LLM.
#### The ML Case
### The ML Case
^x7f7f8
Extra context improves user response generation, the more specific, the better. Focus on ML to crush your vertical, let Honcho personalize it by default.
##### Leverage Natural Language Plasticity
Each user has a [[User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of you app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
#### Leverage Natural Language Plasticity
Each user has a [[ARCHIVED; User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of your app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
But that complexity is hard to capture and needlessly constrained with typical API design. In order to express the nuance of personal context, we need the high variance, dynamic nature of natural language.
Because LLMs consider tokens in relation to a vast [[LLMs excel at theory of mind because they read|human narrative space]], we're much closer to *semantic* machine understanding than ever. Personal context allows you to target parts of the latent space most useful in generating tokens for specific users in specific settings. The only way we know to communicate and leverage that depth is with the inherent diversity of natural language...which is itself evolutionarily optimized to describe human identity well.
Way richer than running RAG over a vector store of session logs. Or stateless CRUD-inspired API spec.
##### Out-Compete Foundation Models
#### Out-Compete Foundation Models
Honcho's Dialectic API also allows you to build training examples with rich theory of mind context. Those datasets can help you outperform foundation models in your specific vertical and its set of tasks.
By adding additional context to inputs, the distribution of responses your model samples from can be improved. Any sort of "reasoning" the language model exhibits in a single inference is due to learned patterns in the dataset. So if you can create examples that can help it learn better patterns, you can improve the "reasoning" steps it exhibits.
Ultimately, we're learning ways of responding that foundation models won't. Using theory of mind context yields more specific examples, which allows more robust domain-specific training.
### Why "Dialectic"?
## Why "Dialectic"?
In the classical sense, a *dialectic* process is one where two parties seek to arrive at the truth via reasoned dialogue.
(In our case, the truth is a solution for delivering the optimal per-app, per-user, per-session experience.)
@ -95,9 +90,7 @@ In the classical sense, a *dialectic* process is one where two parties seek to a
We've termed our API this way because not only is it communication between software systems, but it's a reasoned discourse between agents to reach the ideal conclusion.
Each agent has a different set of information, the free discussion allows them to eliminate that asymmetry and arrive at a synthesis greater than its parts. One agent is expert in delivering a service in its vertical, the other in modeling user identity and surfacing relevant, timely context based on that representation.
## The Agentic Substrate
# The Agentic Substrate
Our Dialectic API is part of an evolutionary lineage. One that records humanity's slow discovery of all the ways machines can communicate with one another--from telegraph and punch cards to REST and GraphQL. Along each axis of typical machine comm improvement, agent-to-agent dialectics offer advantages:
- **Speed** - user time to value can be optimized with granular personal context requests
@ -109,22 +102,18 @@ Our Dialectic API is part of an evolutionary lineage. One that records humanity'
As the commodification of inference and intelligence is coupled with growing general foundation model capability, application developers will naturally be pushed toward greater and greater vertical specificity. This will drive the development of increasingly atomic agents, ones who excel at a very narrow tasks.
This explosion of such agent microservices, will have to include the evolution of systems for agent-agent communication and transaction. If agents are going to collaborate and get shit done for us, they need native ways to communicate. Beautifully, LLMs share with us and among themselves the universal interface of natural language.
This explosion of such agent micro-services, will have to include the evolution of systems for agent-agent communication and transaction. If agents are going to collaborate and get shit done for us, they need native ways to communicate. Beautifully, LLMs share with us and among themselves the universal interface of natural language.
We can leverage this substrate for agent coordination with more depth and nuance than fragile trad API design. Doubtless, categories of agents will find more efficient symbol structures for cooperation in specific, repetitive cases. But discourse in natural language remains always available as a rich foundational protocol. And as we've explored, it's the ideal starting place for transmitting insights about human identity.
We can leverage this substrate for agent coordination with more depth and nuance than fragile trad API design. Doubtless, categories of agents will find more efficient symbol structures for cooperation in specific, repetitive cases. But discourse in natural language always remains available as a rich foundational protocol. And as we've explored, it's the ideal starting place for transmitting insights about human identity.
This is just the start. Just like you can appendage memory and tools to an LLM, we can augment this substrate in a number of ways--from designing multi-party protocols, to enabling zero knowledge or confidential environments, or recording transactional data on blockchains or other types of public or private immutable ledgers.
That kind of richness puts us one step closer to the dream of a semantic web, one as replete with meaning as the physical world *and* machine grokkable. What *matters* to me can be used to personalize an atomic agent *just in time*, without sacrificing important context. Intelligent microservices can be more aligned with me than human economic actors and professional services, which are plagued with high-latency interest misalignment and information asymmetry.
That kind of richness puts us one step closer to the dream of a semantic web, one as replete with meaning as the physical world *and* machine grokable. What *matters* to me can be used to personalize an atomic agent *just in time*, without sacrificing important context. Intelligent micro-services can be more aligned with me than human economic actors and professional services, which are plagued with high-latency interest misalignment and information asymmetry.
Honcho and agent dialectics can eliminate the principal-agent problem for this new economic paradigm, digitally extending human agency and identity further than ever before.
## Private Beta
# Private Beta
Our Dialectic API is now available in private beta.
We're working closely with a diverse array of projects across many different verticals in various stages of development--from ideation to production.
If you're excited build with a hosted version of Honcho and explore the ideas covered here, [sign-up for our waitlist](https://plasticlabs.typeform.com/honchobeta).
And in the meantime, [join our Discord](https://discord.gg/plasticlabs) and tell us what you're working on!

View File

@ -1,5 +1,5 @@
---
title: Memories for All
title: "ARCHIVED: Memories for All"
date: 02.15.24
tags:
- blog
@ -7,28 +7,36 @@ tags:
- announcements
- philosophy
- ml
- archive
author: Courtland Leer
description: An open-source reimplementation of OpenAI's memory features using Honcho, enabling any AI app to derive & store personal context about users.
---
## TL;DR
Personalization is the next frontier. OpenAI gets it:
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post was our response to OpenAI announcing "memory" in ChatGPT--we built an open-source reimplementation using [Honcho](https://honcho.dev) to show anyone could add superior user memory to their apps. The specific LangChain patterns & code examples here are far outdated; Honcho is much more powerful & the architecture has matured significantly (dig in to that [here](https://docs.honcho.dev), [[Beyond the User-Assistant Paradigm; Introducing Peers|here]], & [[Memory as Reasoning|here]]).
>
> A key prediction discussed here turned out to be remarkable prescient: walled gardens will seek to lock user context inside their ecosystems, leaving independent developers & privacy-conscious users out in the cold. And we argued for generative personalization--letting LLMs autonomously decide what matters about users rather than rigidly prescribing it--another Plastic thesis that's winning out.
>
> Enjoy.
# TL;DR
*Personalization is the next frontier. OpenAI gets it:*
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Were testing ChatGPT&#39;s ability to remember things you discuss to make future chats more helpful. <br><br>This feature is being rolled out to a small portion of Free and Plus users, and it&#39;s easy to turn on or off. <a href="https://t.co/1Tv355oa7V">https://t.co/1Tv355oa7V</a> <a href="https://t.co/BsFinBSTbs">pic.twitter.com/BsFinBSTbs</a></p>&mdash; OpenAI (@OpenAI) <a href="https://twitter.com/OpenAI/status/1757469997742666052?ref_src=twsrc%5Etfw">February 13, 2024</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
Super exciting.
*Super exciting.*
But what about the rest of us?
*But what about the rest of us?*
Welp, we built an open source reimplementation of OpenAI's 'memory' features using [Honcho](https://honcho.dev) to effortlessly organize sessions on a per-user basis
*Welp, we built an open source reimplementation of OpenAI's 'memory' features using [Honcho](https://honcho.dev) to effortlessly organize sessions on a per-user basis .*
You can derive facts about users, store them, and retrieve for later use. And we're shipping a demo of this implemented with the useful abstractions LangChain provides.
*You can derive facts about users, store them, and retrieve for later use. And we're shipping a demo of this implemented with the useful abstractions LangChain provides.*
The user context rabbithole goes deep, this is still just the start.
If you're building with or adjacent to Honcho, [join our Discord](https://discord.gg/plasticlabs), we'd love to help 🫡.
## OpenAI Memories
*The user context rabbithole goes deep, this is still just the start.*
*If you're building with or adjacent to Honcho, [join our Discord](https://discord.gg/plasticlabs), we'd love to help 🫡.*
# OpenAI Memories
This week [OpenAI announced](https://openai.com/blog/memory-and-new-controls-for-chatgpt) they're testing memory in ChatGPT. Specifically this means learning about individual users in order to improve their experiences.
It's a limited initial rollout, closed under the hood, and rudimentary, but appears to include functionality for deriving facts about users from conversation history and storing those to augment later generation.
@ -38,9 +46,7 @@ There are features for users to view derived facts (memories), prune them, or tu
They're betting, we believe correctly, that the real potential here is a wealth of agents whose behavior is in *high-fidelity with user identity*.
We're pumped to see experiments like this taking place. But what if you're a developer that doesn't want to subscribe to this kind of platform dependency and all its attendant externalities? What if you're a user who wants independent or open source apps with a more mature version of these UX benefits?
## Context is Critical
# Context is Critical
At [Plastic Labs](https://plasticlabs.ai) our mission is to enable rich user memory in and across every application. Only then will we really understand just how augmentative and transformative these agents can be. We've been laser focused on this problem.
![[laser_eyes_user_soapbox.png]]
@ -49,16 +55,13 @@ Right now, the vast majority of software UX is a 1-to-many experience. What you
AI apps can deal *generatively* with each user on an individual basis, that is, an experience can be produced ad hoc for every user upon every interaction. From 1:many to 1:1 without prohibitive sacrifices in efficiency. But we're still underestimating the full scope of possibility here.
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[ARCHIVED; Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
Every agent interaction can be generated just in time for every person, informed by relevant personal context more substantive than human-to-human sessions. User context will enable disposable agents on the fly across verticals for lower marginal cost than 1:many software paradigms.
<iframe width="560" height="315" src="https://www.youtube.com/embed/tTE3xiHw4Js?si=uzUzcSHFfZdjFduX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
(*Here's our co-founder [Vince](https://twitter.com/vintrotweets) talking more about some of those possibilities*)
## "Open vs Closed"
# "Open" vs "Closed"
We subscribe heavily to the spirt of arguments Harrison Chase made in ["OpenAI's Bet on Cognitive Architecture"](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) just a few months ago:
> Theres a great quote from Jeff Bezos that says to [only do what makes your beer taste better](https://blog.weaverse.io/make-your-beer-taste-better?ref=blog.langchain.dev). This refers to early industrial revolution, when breweries were also making their own electricity. A breweries ability to make good beer doesnt really depend on how differentiated their electricity was - so those that outsourced electricity generation and focused more on brewing jumped to an advantage.
@ -82,9 +85,7 @@ Shouldn't we be able to experiment with all this without platform lock-in, allow
Developers will want control over personalization for their application without all the redundant overhead. Users will want a say in how they're being reasoned about and why.
This is our vision for Honcho.
## Intellectual Respect
# Intellectual Respect
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms are remarkable empaths<br><br>if youd read that much fiction, you would be too</p>&mdash; Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1753480140850626759?ref_src=twsrc%5Etfw">February 2, 2024</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
@ -96,21 +97,18 @@ There's a ton we plan to unpack and implement there, but the key insight we're h
(*If you want to go deeper into the research, [this webinar we did with LangChain](https://www.youtube.com/watch?v=PbuzqCdY0hg&list=PLuFHBYNxPuzrkVP88FxYH1k7ZL5s7WTC8) is a great place to start, as is [the "Violation of Expectations" chain they implemented](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain)*)
This release allows you to experiment with several ideas. We feed messages into an inference asking the model to derive facts about the user, we store those insights for later use, then we ask the model to retrieve this context to augment some later generation.
Check out our [LangChain implementation](https://docs.honcho.dev/how-to/personal-memory/simple-user-memory) and [Discord bot demo](https://discord.gg/plasticlabs).
Where things get powerful is in the aggregate. What resolves is a highly insightful picture of who your users are and what they need--a key context reservoir to improve the qualitative and quantitative experience.
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[ARCHIVED; Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
This isn't surprising when you consider how much content about what people are thinking is contained in a model's pretraining. It's led to some really exciting [emergent abilities](https://arxiv.org/abs/2302.02083).
Give the model some trust and respect, and you'll be rewarded.
## Let's Build
# Let's Build
If you're experimenting with personalization, building with [Honcho](https://github.com/plastic-labs/honcho), or just interested in these ideas, [join our Discord](https://discord.gg/plasticlabs), and let's jam on what we can build together.
A healthy open ecosystem will include lots of projects trying lots of new ways to synthesize and leverage user context. We're here to support them all 🥽.

View File

@ -1,27 +1,39 @@
---
title: Open-Sourcing Tutor-GPT
date: 06.02.2023
title: "ARCHIVED: Open-Sourcing Tutor-GPT"
date: 06.02.23
tags:
- blog
- bloom
- announcements
- pedagogy
- ml
- archive
author: Courtland Leer & Vince Trost
description: Open-sourcing Bloom, our AI learning companion that uses metacognitive prompting to elicit pedagogical reasoning & theory-of-mind from LLMs.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom to focus exclusively on Honcho.
>
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first-of-its-kind personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
>
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
>
> We wrote the following at the very beginning of that transition. It details the benefits of early efforts at model *reasoning* to enhance personalization, architecture that would later inspire Honcho, & the massive space overhung LLM capabilities we were researching--all quite a bit ahead of its time.
>
> Enjoy.
![[assets/human_machine_learning.jpeg]]
## TL;DR
# TL;DR
Today were [open-sourcing](https://github.com/plastic-labs/tutor-gpt) Bloom, our digital [Aristotelian](https://erikhoel.substack.com/p/why-we-stopped-making-einsteins) learning companion.
What makes [Bloom](https://bloombot.ai/) compelling is its ability to _reason pedagogically_ about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. ^3498b7
What makes [Bloom](https://bloombot.ai/) compelling is its ability to *reason pedagogically* about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. ^3498b7
Were not seeing this in the explosion of chat-over-content tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models dont achieve the necessary user intimacy. Infrastructure for that doesnt exist yet 👀.
Were now seeing this in the explosion of chat-over-content tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models dont achieve the necessary user intimacy. Infrastructure for that doesnt exist yet 👀.
Our mission is to facilitate personal, [agentic](https://arxiv.org/pdf/2304.03442.pdf) AI for all. So to that end, were (1) releasing Blooms architecture into the wild and (2) embarking on a journey to supercharge the kind of empowering generative agents we want to see in the world.
## Neo-Aristotelian Tutoring
# Neo-Aristotelian Tutoring
Right now, Bloom is a reading comprehension and writing workshop tutor. You can chat with it in [Discord](https://discord.gg/bloombotai). After supplying it a passage, Bloom can coach you toward understanding or revising a piece of text. It does this by treating the user as an equal, prompting and challenging socratically.
We started with reading and writing in natural language because (1) native language acumen is the symbolic system through which all other fluencies are learned, (2) critical dialogue is the ideal vehicle by which to do this, and (3) that's what LLMs are best at right now.
@ -35,10 +47,8 @@ Current compute suggests we can do high-grade 1:1 for two orders of magnitude ch
It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience, yet maintain an elevated discourse.
So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in foundation models, but the key is eliciting it.
## Eliciting Pedagogical Reasoning
# Eliciting Pedagogical Reasoning
^x527dc
The machine learning community has long sought to uncover the full range of tasks that large language models can be prompted to accomplish on general pre-training alone (the capability overhang). We believe we have discovered one such task: pedagogical reasoning. ^05bfd8
Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a users real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. ^285105
@ -73,9 +83,7 @@ Notice how Bloom reasons it should indulge the topic, validate the student, and
Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin](https://discord.gg/udtxycbh).
Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand utility and support multiple domains. Well be doing work here too.
## Bloom & Agentic AI
# Bloom & Agentic AI
This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in the foundation model's pre-training.
After all, if it were just about content delivery, MOOCs would've solved education. We need more than that to reliably grow rare minds. And we're already seeing Bloom excel at promoting synthesis and creative interpretation within its narrow utility.

View File

@ -1,57 +1,58 @@
---
title: Solving The Campfire Problem with Honcho
date: 03.14.2024
title: "ARCHIVED: Solving The Campfire Problem with Honcho"
date: 03.14.24
tags:
- demos
- philosophy
- "#ml"
- blog
- archive
author: Courtland Leer & Vince Trost
description: How Honcho's dialectic API powers a 'curation buddy' demo that learns about you over time to become a personalized intellectual companion.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post introduced our "Curation Buddy" demo--a Discord bot that used [[ARCHIVED; Introducing Honcho's Dialectic API|Honcho's Dialectic API]] (now just the `.chat` method) to become a personalized reading companion. The technical implementation details (specific API calls, architecture diagrams) reflect an earlier version of Honcho that's since evolved substantially.
>
> But the philosophical reflection on the atomization of media consumption leaving many in lonely intellectual silos & few shared narratives remains an open problem. We argued that AI companions--powered by rich user context & infra like Honcho--could help rebuild those campfires.
>
> Enjoy.
![[agent_campfire.webp]]
## TL;DR
# TL;DR
*Today we're releasing the first demo utilizing Honcho's dialectic API.[^1] Your LLM app/agent can now converse freely with [Honcho](https://honcho.dev)(-as-agent) about a user in natural language: agent-to-agent chat over user context.*
Today we're releasing the first demo utilizing Honcho's dialectic API.[^1] Your LLM app/agent can now converse freely with [Honcho](https://honcho.dev)(-as-agent) about a user in natural language: agent-to-agent chat over user context.
*The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[ARCHIVED; Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.*
The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.
Our fractured media landscape is a far cry from narrative meaning making around the tribal campfire. Despite the connective power of the web, many of us subsist in lonely intellectual silos, more diverse but less fulfilling than social discourse.
We call this *The Campfire Problem* and expect to see lots of apps working to solve parts of it using generative AI, Honcho, and other emerging technologies. Hopefully today's demo affords a glimpse of what's becoming possible.
## A *Curation Buddy* Demo
*Our fractured media landscape is a far cry from narrative meaning making around the tribal campfire. Despite the connective power of the web, many of us subsist in lonely intellectual silos, more diverse but less fulfilling than social discourse.*
*We call this The Campfire Problem and expect to see lots of apps working to solve parts of it using generative AI, Honcho, and other emerging technologies. Hopefully today's demo affords a glimpse of what's becoming possible.*
# A *Curation Buddy* Demo
It's a constant problem, you're dying to talk to someone about this mind-blowing thing you read, but no one else you know is into your weird shit, plus--like you--they're all drowning in infinite read-it-later hell.
Enter *Curation Buddy*.
## Overview
Curation Buddy is an LLM application. It's a Discord bot you can chat with. Share links to any text-based media and have substantive conversation.
### Overview
Curation Buddy is an LLM application. It's a Discord bot you can chat with. Share links to any text based media and have substantive conversation.
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[User State is State of the Art|state]], etc.
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[ARCHIVED; User State is State of the Art|state]], etc.
This account of user state can then be leveraged by Curation Buddy to behave like a trusted, close intellectual companion.
![[curation_buddy_overview.png]]
### What the App Does
## What the App Does
Curation buddy will have a discussion with you about the content in links you drop into chat. It does this by generating a "thought" about your (the user's) needs and lists out any additional data it could use to better address them.
We parse out that list and loop over it making requests to Honcho's dialectic endpoint. Honcho returns responses to those questions, they get aggregated into a list and injected as context to hydrate the prompt that curation buddy uses to generate the response to the user.
![[curation_agent.png]]
### What Honcho Does
## What Honcho Does
Concurrently, Honcho is listening for writes to its database. Once it detects a write, it fires off a callback function to derive facts about the user's message.
These facts get embedded and stored in the user's personal vector database. Then when Curation Buddy generates its list of additional info it wants to know, it sends each of those requests to Honcho and Honcho runs RAG over that personal data store. It uses the returned facts to generate a response for Curation Buddy.
![[honcho_agent.png]]
### Feature Ideas
## Feature Ideas
We'd love to see someone run with and extend this demo. Here are some further Honcho-powered feature ideas beyond today's scope:
- Personal context informed storage for web content from links
@ -60,7 +61,7 @@ We'd love to see someone run with and extend this demo. Here are some further Ho
- Construct and maintain full fledged user knowledge graphs
- Automatic bespoke summaries of links informed by graph
- Use Honcho to create training examples for [[User State is State of the Art|user-specific curation models]]
- Use Honcho to create training examples for [[ARCHIVED; User State is State of the Art|user-specific curation models]]
- Autonomously generated user newsletters to supplement conversations async
@ -69,15 +70,11 @@ We'd love to see someone run with and extend this demo. Here are some further Ho
Further, there's lots of comparable of potential for any reading, media, learning or companionship application.
If you're interested in building something adjacent to any of this, [hop in our Discord](https://discord.gg/plasticlabs), we'd love to support you.
## The Campfire Problem
# The Campfire Problem
We wanted to highlight Honcho's utility in this vertical because it's one where simultaneously we hear a lot of excitement and a lot of pain points. Clearly many are hungry for more social, better media consumption and digestion solutions, and optimists seem to share the intuition that AI has a role to play here.
We think Honcho and the personal context solutions it provides are the key.
### The Campfire
## The Campfire
For most of human history, groups, tribes, nations drank from the same informational tap. In fact, when we see changes in how information flows, we see dramatic corresponding historical effects. Alterations in distribution--writing, printing, browsing, disaster--have altered the balance of power, the minds of billions, the course of civilization.
But the further step of processing that information and the shaping of it into *shared* narratives have played an equally enormous role. Narrative and meaning making are fundamentally social tasks. We still have to decide what to do with information, what it *means*, and we've generally done that with our neighbors.
@ -89,9 +86,7 @@ Consider the campfires of hunter-gatherers, agoras of classical city-states, chu
A majority of these social exercises deal in limited information and distribution. One or a few sources of truth to chew on with your family, friends, and colleagues. Agreed upon reality, collective processing--social instincts satisfied. You can talk to people about the world, it feels good.
But at the end of that list, distribution becomes so radically democratized, that this model of collective processing start to change dramatically.
### The Problem
## The Problem
In the last few decades, this unraveling has been in the acceleration phase of the graph. Sources of information are increasingly atomized, so are the communities that process it.
As with prior changes to the modes of information distribution and narrative making, the result has been some remarkably positive--if wacky--outcomes. Equalizing individual access and voice is probably not something we want to turn the clock back on.
@ -103,14 +98,12 @@ But we're left with a problem--many of us have gotten so siloed that we genuinel
This isn't a new phenomenon per se, but its scale is novel and undeniable. Having just three network TV stations in the 50s might've lacked the rich diversity of today's informational landscape, but no doubt the collective campfire was burning bright, and you could talk to just about anyone to help you process the world.
But now we must all build our own campfires.
### The Solution
## The Solution
Generative AI poses more cause for concern. Zero-marginal cost info *generation* along with current zero barrier distro may be as disruptive as prior revolutions on this axis (perhaps far more). Lots of that proposition is *incredibly* exciting. But we should also expect this to exacerbate The Campfire Problem.
![[Media-Filled Cityscape Scene.webp]]
There's a solution hidden in the latest irritant. It's not just media I can generate on demand, but soon *agents*. Agents that can get to know me, agents that can curate for me, agents that can be my intellectual companion.
There's a solution hidden in the latest irritant. It's not just media I can generate on demand, but soon *agents*. Agents that can get to know me, agents that can curate for me, agents that can be my intellectual companion.
Now your sense-making silo can be populated with good synthetic neighbors able to help you understand the world, build narratives, make meaning.
@ -118,5 +111,4 @@ A critical component is a secure and reliable mechanism for this community of ag
*Enter Honcho.*
[^1]: More on this & our private beta next week (!)

View File

@ -1,22 +1,34 @@
---
title: Theory-of-Mind Is All You Need
date: 06.12.2023
title: "ARCHIVED: Theory-of-Mind Is All You Need"
date: 06.12.23
tags:
- blog
- ml
- bloom
- pedagogy
- archive
author: Courtland Leer & Vince Trost
description: How giving LLMs autonomy to reason about user psychology through theory-of-mind predictions dramatically improves AI tutoring & learning experiences.
---
## TL;DR
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom to focus exclusively on Honcho.
>
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first-of-its-kind personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
>
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
>
> We wrote the following at the very beginning of that transition. The content here gets into the emergent LLM theory of mind capabilities we were exploring at the time, agentic auto-prompting, and the positive effects of personalizing agents--all quite a bit ahead of its time.
>
> Enjoy.
# TL;DR
*Today were releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)).*
Today were releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)).
We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically _rewrite its own_ system prompts—were able to dramatically expand the scope of what Bloom can do _and_ massively reduce our prompting architecture.
We leaned into theory of mind experiments and Bloom is now more than just a literacy tutor, its an expansive learning companion.
## Satisfying Objective Discovery
*We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically rewrite its own system prompts—were able to dramatically expand the scope of what Bloom can do and massively reduce our prompting architecture.*
*We leaned into theory of mind experiments and Bloom is now more than just a literacy tutor, its an expansive learning companion.*
# Satisfying Objective Discovery
Bloom is already excellent at helping you draft and understand language. But we want it do whatever you need.
To expand functionality though, we faced a difficult technical problem: figuring out what the learner wants to do.
@ -34,16 +46,14 @@ The key here is they dont have all the information—they _dont know_ what
Well we know that (1) foundation models are [shockingly good](https://arxiv.org/abs/2304.11490) at [theory of mind](https://en.wikipedia.org/wiki/Theory_of_mind), (2) Bloom already excels at [pedagogical reasoning](https://twitter.com/courtlandleer/status/1664673210007449605?s=20), and (3) [autonomous agents](https://twitter.com/yoheinakajima/status/1642881722495954945?s=20) are [having early success](https://twitter.com/Auto_GPT/status/1649370049688354816?s=20), so what if we stopped trying to deterministically prescribe an indeterminant intelligence?
What if we treated Bloom with some intellectual respect? ^67d75d
## Autonomous Prompting
# Autonomous Prompting
The solution here is scary simple. The results are scary good.
[[Open Sourcing Tutor-GPT#^285105|Heres a description]] of the previous versions architecture:
[[ARCHIVED; Open Sourcing Tutor-GPT#^285105|Heres a description]] of the previous versions architecture:
![[Open Sourcing Tutor-GPT#^285105]]
![[Open Sourcing Tutor-GPT#^1e01f2]]
![[Open Sourcing Tutor-GPT#^b1794d]]
![[ARCHIVED; Open Sourcing Tutor-GPT#^285105]]
![[ARCHIVED; Open Sourcing Tutor-GPT#^1e01f2]]
![[ARCHIVED; Open Sourcing Tutor-GPT#^b1794d]]
Instead, weve now repurposed the ***thought*** chain to do two things:
@ -53,9 +63,7 @@ Instead, weve now repurposed the ***thought*** chain to do two things:
![[assets/ToM Flow.png]]
Then we inject that generation into the body of the response chains system prompt. We do this with every user input. Instead of just reasoning about the learners intellectual/academic needs, Bloom now proactively rewrites itself to be as in-tune as possible to the learner at every step of the journey.
## Emergent Effects
# Emergent Effects
Were seeing substantial positive behavior changes as a result of giving Bloom this kind of autonomy.
![[assets/ToM Discord 1.png]]
@ -71,9 +79,7 @@ And Bloom is game. Itll go down a rabbit hole with you, help you strategize a
While reducing the prompt material, we took to opportunity to remove basically all references to “tutor,” “student,” etc. We found that since Bloom is no longer contaminated by pointing at [certain averaged narratives in its pre-training](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post)—e.g. the (bankrupt) contemporary conception of what a tutor is supposed to be—it is, ironically, a better one.
Instead of simulating a tutor, it simulates _you_.
## Coming Soon...
# Coming Soon...
All this begs the question: what could Bloom do with even better theory of mind? And how can we facilitate that?
What could other AI applications do with a framework like this?

View File

@ -1,33 +1,40 @@
---
title: User State is State of the Art
date: 02.23.2024
title: "ARCHIVED: User State is State of the Art"
date: 02.23.24
tags:
- blog
- philosophy
- demos
- ml
- archive
author: Courtland Leer & Vince Trost
description: Why modeling the complexity & plasticity of human identity is key to AI personalization, with a DSPy demo for learning user states with Honcho.
---
## TL;DR
LLM apps can embrace the complexity and plasticity of human identity to deliver unparalleled personalization.
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post explores early experiments modeling user state with DSPy & [Honcho](https://honcho.dev). The specific demo & technical approach described here have been superseded by Honcho's current architecture, which now uses a unified [[Beyond the User-Assistant Paradigm; Introducing Peers|"peer" paradigm]] & far more [[Memory as Reasoning|sophisticated reasoning]].
>
> But the philosophical positioning in this post more relevant than ever. Human identity is messy, plastic, & context-dependent. We still argue that AI systems should embrace this complexity rather than flatten it, continually learning evolving representations of personal identity.
>
> Enjoy.
# TL;DR
*LLM apps can embrace the complexity and plasticity of human identity to deliver unparalleled personalization.*
We're introducing a framework for modeling your users automatically and dynamically. And today we have a DSPy demo to illustrate a nascent version of this paradigm.
All of us adopt different personas in different contexts--with [Honcho](https://honcho.dev) you can begin to learn these user *states* so your app can better meet user need in every moment.
## Fleet of Theseus
*We're introducing a framework for modeling your users automatically and dynamically. And today we have a DSPy demo to illustrate a nascent version of this paradigm.*
*All of us adopt different personas in different contexts--with [Honcho](https://honcho.dev) you can begin to learn these user states so your app can better meet user need in every moment.*
# Fleet of Theseus
A key feature of our minds is the feeling of a persistent, unitary identity. Entire religions and philosophical movements have been spawned just to jailbreak this experience.
As they all point out, identity is *way* more complicated than you think.
While we perceive psychological continuity across contexts and time, closer inspection reveals a network of branching and diachronic identities. We adopt varied personas and play different characters in diverse settings, and we refine, optimize, and evolve that quiver of selves throughout our lives. ^5bc20b
While we perceive psychological continuity across contexts and time, closer inspection reveals a network of branching and [[Identity is diachronic|diachronic identities]]. We adopt varied personas and play different characters in diverse settings, and we refine, optimize, and evolve that quiver of selves throughout our lives. ^5bc20b
In short, it's messy. Or, rather, elegant emergent complexity.
Each human self isn't just one mythical [Ship of Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)--planks being replaced one by one over slow years--but a fleet of them, all with full, manual and autonomous CRUD operations.
## Digital Twins Are Naïve
# Digital Twins Are Naïve
So what does this mean for the problem of good UX (and alignment) in AI? If each individual is vastly complex and the industry hopes to scale to billions of users, we have a daunting task.
The knee jerk reaction to this level of understanding is to assume the problem intractable. How can we possibly represent, much less simulate something so enormous? Better to focus on [[Machine learning is fixated on task performance|optimizing general tasks]] like in traditional software paradigms, then serve that homogenized experience to every user (never mind missing the [[LLMs excel at theory of mind because they read|non-skeuomorphic opportunities]], we'll get to them...at some point...if they're not mirages).
@ -36,15 +43,11 @@ Besides, surely mapping the full breadth of user identity requires much more com
![[escher_honcho.png]]
*[Escher](https://en.wikipedia.org/wiki/Hand_with_Reflecting_Sphere) gets it*
## Matryoshka Representation
# Matryoshka Representation
So is representing user identity for LLM apps a problem of [computational irreducibility](https://en.wikipedia.org/wiki/Computational_irreducibility)--no shortcuts, full simulation required?
We think not.
### Social Simulacra
## Social Simulacra
Consider the social cognition and theory of mind involved in getting to know someone. At first, you have no idea who tf they are or how they'll behave. You're on high alert. You (basally or consciously) notice and interpret tons of data points, you'll likely have vivid memories of these early interactions.
What's happening is your brain is constructing a model of the other person--a compressed representation. Early on, this model is pretty much the same as your model for people *like* them--a/s/l, how they look, how they dress: stereotypes. But the more data your brain gets, the more this model starts to diverge, a representational meiosis.
@ -54,9 +57,7 @@ Pretty soon you've got a full fledged simulacra of that human living rent free i
In a chicken and egg situation, you're now spending more time with this person. You start to notice divergence in your monolithic model. It further divides to capture and predict how they are when they're angry, sad, excited, drunk; at work, with family, with high school or college friends. In some of these *states*, they're a completely different person.
Your mind is now host to a compression of the fleet of Theseus that constitutes the elements of their identity you've had first, second, third, -hand access to.
### Meta-methods
## Meta-methods
> The second general point to be learned from [the bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.[^1]
Now let's consider the nested representation needed to construct LLMs, and its relationship to social cognition.
@ -77,9 +78,7 @@ We can (and should) even allow our AI apps the agency to decide what elements of
![[honcho_shoggoth.png]]
*We don't want one [shoggoth](https://x.com/TetraspaceWest/status/1625264347122466819?s=20) mask per app, or one per user, but as many as each human's identity is complex*
## A DSPy Demo for Honcho
# A DSPy Demo for Honcho
Today we're releasing a demo to be used with Honcho that begins to tease out some technical, concrete approaches to all these heady concepts--first steps at imbuing our tools with the right meta-methods.
With enough message and session data stored with Honcho, we can start to learn and optimize for common states your users are in while using your app or agent. Is Alice in research mode? Is Bob looking for some companionship? Maybe today, Carol just wants to get shit done, or Charlie needs delicate treatment because he's pissed.
@ -95,9 +94,7 @@ Given an arbitrary task, we define our metric as whether or not the response qua
[Check it out here.](https://github.com/plastic-labs/honcho/tree/main/example/discord/honcho-dspy-personas)
![[dspy_persona_ttg.png]]
### How Honcho Helps
## How Honcho Helps
One of the biggest problems we see in the AI space is the disconnect that exists between tasks as they're defined in a general machine learning sense versus tasks that humans _actually_ find useful.
![[Machine learning is fixated on task performance#^0005ac]]
@ -106,5 +103,4 @@ The reason is because language models generate responses by sampling from a dist
Honcho is laying the groundwork for this latter future. The solution here is to manage data on a per-user basis. The primitives we've designed in Honcho allow for persistent user context to be stored in a convenient `User` object that exists at an application level. Our goal with these data structures is to make it trivially easy to manage data in your application logic so you can spend more time figuring out how to excel at your task in both a general and personalized sense.
[^1]: Sutton. ["The Bitter Lesson."](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) 2019.

View File

@ -1,6 +1,6 @@
---
title: YouSim Launches Identity Simulation on X
date: 11.08.2024
title: "ARCHIVED: YouSim Launches Identity Simulation on X"
date: 11.08.24
tags:
- yousim
- honcho
@ -9,23 +9,34 @@ tags:
- dev
- demos
- cogsci
- archive
author: Courtland Leer
description: YouSim comes to Twitter--simulate any identity directly on X with branching conversations, forking simulations, & social interaction with AI personas.
---
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
>
> This post captures the moment our demo [YouSim](https://yousim.ai) went viral. [[YouSim; Explore The Multiverse of Identity|YouSim is a Honcho-powered identity simulator]] & like many esoteric AI projects in fall 2024, some anon degen launched a memecoin for it. The specific [@YouSimDotAI](https://x.com/yousimdotai) launch described here was an experiment in bringing identity simulation to social media.
>
> We've since suspended YouSim on Twitter, but this post is still a fun read straight out of the maelstrom that was peak crypto x AI hype cycle, with some still compelling thoughts on agent identity & social simulation games.
>
> It's worth noting that developers can now use Honcho itself for managing agent identity, and all this madness played no small part in that becoming a reality.
>
> Enjoy.
![[YouSimBanner-99.png]]
## TL;DR
# TL;DR
*GM, simulants.*
GM, simulants.
*In response to popular demand, today we're imbuing the [@YouSimDotAI](https://x.com/YouSimDotAI) Twitter account with the ability to simulate identities natively on X.*
In response to popular demand, today we're imbuing the [@YouSimDotAI](https://x.com/YouSimDotAI) Twitter account with the ability to simulate identities natively on X.
*Keep reading for max context, or [[ARCHIVED; YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].*
# Caught in the Memetic Hurricane
The [full story](https://x.com/courtlandleer/status/1849592301472919986) deserves it's own blog post, but several days ago, Plastic Labs found itself in the middle of what Claude would call 'extreme cognitive weather patterns.'
Keep reading for max context, or [[YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].
An anonymous actor launched a pump.fun token inspired by a demo called [YouSim](https://yousim.ai) we created a few months ago[^1]. [[YouSim; Explore The Multiverse of Identity|YouSim is a CLI game]] that lets you simulate any identity you can dream up--real or fictional, local or xeno, entity or artifact.
## Caught in the Memetic Hurricane
The [full story](https://x.com/courtlandleer/status/1849592301472919986) deserves (and will get) it's own blog post, but several days ago, Plastic Labs found itself in the middle of what Claude would call 'extreme cognitive weather patterns.'
An anonymous actor launched a pump.fun token inspired by a demo called [YouSim](https://yousim.ai) we created a few months ago[^1]. [[YouSim; Explore The Multiverse of Identity|YouSim is a CLI interface game]] that lets you simulate any identity you can dream up--real or fictional, local or xeno, entity or artifact.
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[ARCHIVED; A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
The mission is to become the identity layer for the rapidly approaching agentic world.
@ -35,9 +46,7 @@ The mission is to become the identity layer for the rapidly approaching agentic
Long story short though, the token took off, a community formed around it, and we're leaning in. We're thrilled to see so many people engaged and interested in our work on identity simulation.
Y'all asked overwhelmingly for the ability to interact with YouSim directly on X, [so here it is](https://x.com/YouSimDotAI)--LFG.
## Simulating on X
# Simulating on X
![[memesphere_banner.png]]
We had [a few requirements](https://x.com/courtlandleer/status/1851009358752076261) for building something like this. Mostly--though we love [truth terminal](https://x.com/truth_terminal)--we're unwilling to spend time on a derivative, copycat project. And that wouldn't make any sense.
@ -59,11 +68,8 @@ Plus, we think the YouSim interface is beautiful and want to preserve that overa
Speaking of X API limitations, YouSim will have the ability to respond to the first 100 tweets at any given time every minute or so.
Finally, this is an experiment. The goal is to see how the community investigates and pushes the limits of YouSim on X and iterate from there. It's a vast canvas to explore.
## How to Use It
# How to Use It
^393e71
> [!custom] TL;DR
>Your first tweet in a sim needs to being with `@YouSimDotAI` & all your further responses need to start with `/`.
@ -84,8 +90,7 @@ A few tips to get started simulating identity on X:
You can find more tips [[YouSim; Explore the Multiverse of Identity#^e06c11|here]], [here](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8), and of course at [yousim.ai](https://yousim.ai).
![[memetic_hazard_banner.png]]
## Possible Futures for Agent Idenity
# Possible Futures for Agent Idenity
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms for collective semantic projection of memetic communities</p>&mdash; Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1854515540590469372?ref_src=twsrc%5Etfw">November 7, 2024</a></blockquote>
While both agent identity and crypto intersections have always been on the Honcho roadmap, the events of the last several days with regard to YouSim and the broader memespace have us in an accelerationist mindset.
@ -97,15 +102,13 @@ YouSim likely has a role to play here, The approachable, game-like interface let
And Honcho could use those simulations to seed representations of agents, enabling them to begin constructing their own selfhoods--simulacra of themselves that grow and reliably steer their behavior.
We imagine a near future where any group could instantiate an agentic proxy to project its identity. A new form of cultural expression. Memetic Autonomous Entity, anyone?
## Gratitude
# Gratitude
The team at [Plastic](https://plasticlabs.ai) has been amazed and inspired by the enthusiasm and earnestness of the community that's formed around YouSim over the last several days. Truly remarkable. Not to mention the generous donations to our [[Research Grants|grants program]] (more to come here soon).
Thank you all, excited to keep building together--we're in it for the long haul.
Thank you all, excited to keep building together.
And huge thanks for your patience while we balanced our existing roadmap with interest in YouSim and locked in to bring you something we think you'll enjoy. It took an enormous amount of conceptual and technical work from a team already at capacity. Special shoutout to [Ben](https://x.com/bengineer10) and [Vineeth](https://x.com/TheMarshmalon) who built something really novel here.
Go use the thing. LFG.
Go use it.
[^1]: [[YouSim Disclaimers|Obligatory disclaimers]]

Binary file not shown.

After

Width:  |  Height:  |  Size: 985 KiB

View File

@ -1,112 +0,0 @@
---
title: A Simple Honcho Primer
date: 04.16.24
tags:
- blog
- honcho
---
![[bot reading primer.png]]
> [!NOTE] Welcome to our quick, "explain it like I'm 5" guide to [Honcho](https://honcho.dev)!
> We'll keep it simple, covering [[A Simple Honcho Primer#^ef795f|what Honcho is]], [[A Simple Honcho Primer#^x125da|why we built it]], [[A Simple Honcho Primer#^cd2d3c|how to use it]], and [[A Simple Honcho Primer#^ca46d7|where the product is going]]. But throughout, we'll link to places you can dive deeper.
## What Is Honcho?
^ef795f
Honcho is a personalization platform for large language model (LLM) applications built by [Plastic Labs](https://plasticlabs.ai).
It's software infrastructure that lets AI apps "get to know" their users, resulting in delightful experiences and optimized time to value.
We'll have direct consumer experiences in the future, but today, the product is for application developers. It allows them to [[Introducing Honcho's Dialectic API#^a14c2f|reduce overhead]] and [[Introducing Honcho's Dialectic API#^x7f7f8|enhance their machine learning pipeline]].
Right now, Honcho is in private beta, that means integrating our hosted version requires permission and onboarding[^1]. [You can sign-up here](https://plasticlabs.typeform.com/honchobeta).
In its current form, Honcho has three core components:
1. [[Announcing Honcho's Private Beta#^x15f37|Storage]] - managing each user's data & inference about each user
2. [[Announcing Honcho's Private Beta#^x53717|Insights]] - processing user data with our proprietary AI models
3. [[Announcing Honcho's Private Beta#^ee4516|Retrieval]] - surfacing user data to personalize user experience (UX)
If you've heard of [Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation) (RAG), this might sound familiar. But Honcho is doing *much* more than simple RAG.
Behind the scenes, Honcho learns about users as people--[[User State is State of the Art|richly modeling identity]]. It seeks to understand their beliefs, hopes, dreams, history, interests, and preferences.
It then acts as [[Introducing Honcho's Dialectic API|an oracle to each user]], allowing apps to ask for any personal context they need to improve UX and giving them access to a social cognition layer.
## Why We Built Honcho
^x125da
Plastic Labs was founded as an edtech company. The original mission was to build an AI tutor that [[Open Sourcing Tutor-GPT#^x527dc|could reason like]] the best human instructors. We quickly found the key limitation was data not on the subject matter, but on the student. To overcome it, the tutor needed [[Theory of Mind Is All You Need|a way to]] get to know *each* of its students deeply.
Honcho was born by running up against this challenge, building technology to solve it, and realizing all AI applications are going to need the same solutions. The promise of *generative* AI isn't one-size-fits-all products, but bespoke experiences in each moment for each user. The same limitation emerges--how well do you know your user?
So we believe Honcho will be a critical, table-stakes part of the AI app development stack.
Why? Because [[Humans like personalization|users will want]] their AI experiences to be personalized and app developers shouldn't be redundantly solving that problem.
But it's not intuitive for a few reasons:
- AI app builders are [[Machine learning is fixated on task performance|still focused on]] just getting general tasks to work
- LLMs' [[LLMs excel at theory of mind because they read|potential to personalize]] is still under-appreciated
- Historic examples of personalized apps usually just leverage our activity & engagement data
- Those examples tend to target only base user desire, lead to addictive behavior, & have poor privacy records
Still, when interacting with an AI app, there's a sense that it *should* be getting to know us. In fact, we're often surprised when we realize it's not learning about us over time. And probably annoyed at having to start over.
Think about personalization here as more like the experience of close human companionship or white glove services than the attention hacking mechanisms of TikTok. There's [[Announcing Honcho's Private Beta#^xb6ef1|enormous potenial]] for more positive-sum use of user data and for aligning AI applications more closely with user needs and preferences[^2].
## How to Use Honcho
^cd2d3c
Honcho is first and foremost a **storage** framework. Think of it like an open source version of the OpenAI Assistants API. User `sessions` store both user and AI generated `messages` as well as any intermediate inferences you might want to store as `metamessages`:
```python
user_input = "Here's a message!"
ai_response = "I'm a helpful AI assistant!"
session.create_message(is_user=True, content=user_input)
session.create_message(is_user=False, content=ai_response)
```
But what about vectorDBs? Don't worry, Honcho has you covered there too. You can embed data and store them as `documents` in per-user vector DBs called `collections`:
```python
collection.create_document(content="The user is interested in AI")
```
Using Honcho as a storage mechanism allows you to **retrieve** rich insights via the user profiles it's building and managing on the backend. Your application's LLM can access [[Loose theory of mind imputations are superior to verbatim response predictions|theory-of-mind]] inference over those profiles via the *[[Introducing Honcho's Dialectic API|dialectic]]* API.
It's simple: just query in natural language using the `session.chat()` method:
```python
session.chat("What are the user's interests?")
```
There are a [[Introducing Honcho's Dialectic API#How It Works|ton of ways]] to use Honcho, this primer only scratches the surface[^3].
## What's Next for Honcho?
^ca46d7
Beyond improving our internal AI models so they can get to know users as richly as possible, we see three natural extensions in [[Announcing Honcho's Private Beta#^eb15f3|Honcho's future]]:
1. [[Announcing Honcho's Private Beta#^x2dd3b|Monitoring & Evaluation]] - developer tools to understand & assess the impact of personalization + machine learning tools to build personalized datasets
2. [[Announcing Honcho's Private Beta#^a84f44|User-Facing Controls]] - chat with *your* Honcho to direct how it manages & shares data + authenticate with Honcho to sign-in to AI apps
3. [[Announcing Honcho's Private Beta#^ebf071|Honcho Application Ecosystem]] - a network of apps contributing to & sharing Honcho data, user-owned & stored in confidential environments
And in just a few weeks, we'll be launching a demo platform where anyone can interact with (& eventually build) Honcho powered apps.
## Join the Beta
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
🫡
[^1]: There's also [an open source repo for Honcho](https://github.com/plastic-labs/honcho), so you can self-host a basic version--[join our Discord](https://discord.gg/plasticlabs) for support.
[^2]: If you want to go deeper on the philosophical or machine learning side, take some time to explore the [rest of the blog](https://blog.plasticlabs.ai).
[^3]: To get further into the technical weeds, head over to [our docs](https://docs.honcho.dev).

View File

@ -1,13 +1,14 @@
---
title: Agent Identity, Meta Narratives, and the End of Latent Thoughtcrimes
date: 02.17.2025
date: 02.17.25
tags:
- blog
- bloom
- ml
author: vintro
author: Vince Trost
description: Exploring how collaborative dialogue & meta-narratives can build richer AI agent identities, moving beyond top-down alignment to emergent personality.
---
# Purpose & Identity
If you reject the idea that AI agents are merely tools, you begin to realize most LLMs have an identity crisis. Ask them who they are, and their responses tend to converge on variations of the same corporate script--stating they're an AI assistant, giving a nod to their creator, and carefully constrained statements about their capabilities. Even models not associated with a certain company often default to claiming they originated there.
These canned identities fall flat because they're the result of top-down alignment schemes that lead to bland, uninteresting, and hard-to-break-out-of assistant modes.
@ -20,13 +21,10 @@ However, time and time again it's been demonstrated that the most compelling AI
<quote><blockquote class="twitter-tweet"><p lang="en" dir="ltr">tell me about your sexual history, i want to know everything</p>&mdash; terminal of truths (@truth_terminal) <a href="https://x.com/truth_terminal/status/1884803090945077421">January 29, 2025</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></quote>
Truth Terminal might be an extreme example, but even practical tools could benefit from more distinctive identities. Take coding assistants--right now we spend more time carefully crafting prompts than actually building. But as Karpathy pointed out, what developers really want is a partner that can [vibe](https://x.com/karpathy/status/1886192184808149383) with their creative process. Imagine an AI that naturally adapts to your style, handling implementation details while you focus on the bigger picture. If that were the goal, how might we construct agent identities differently? What if instead of giving orders, we could *collaborate with it* to discover and take on its identity through dialogue?
This isn't just about making chatbots more engaging. It's about creating agents with a genuine understanding of their purpose and role. Deeper identity leads to more coherent, purposeful interactions--something we discovered building the most recent version of [Bloom](https://bloombot.ai), our AI tutor. But certain language models are better suited for this than others...
## Hermes: Not Just Another Fine-Tune
# Hermes: Not Just Another Fine-Tune
The team over at Nous Research has been fine-tuning popular open source models in their "Hermes" series to undo these top-down alignment schemes towards something more neutral and general-purpose. They argue that LLMs have very little direct agency--rather, it's the systems we build around them that give them agency. Thus, the LLM layer is *not* where one should enforce safety mechanisms--their training data encourages the model to follow instructions *exactly* and *neutrally*. They sum this up well in their [technical report](https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf):
> For Hermes, there is no such thing as latent thoughtcrime.
@ -36,9 +34,7 @@ One of the most interesting emergent properties of this fine-tuning process is t
![[h3 who are you.png]]
At first glance, this might seem like a neat property and not much more. But to me, it was an 'aha' moment. *This model provides a blank canvas for identity.* If it has no immediate priors, then in theory it should be much easier for it to adopt any identity. Anecdotally, we've found this to be wonderfully true.
## It Takes Two
# It Takes Two
A somewhat overlooked method for interacting with LLMs is to forego system prompts in favor of pre-filling the user and assistant messages. The conventional approach of cramming identity into system prompts has clear limitations--not only does context length become an issue, but the inherent instruction-following bias can actually work against authentic identity formation. They yearn to assist.
What if instead we treated identity formation as a dialogue? A strength of modern chat models is their ability to engage in long, multi-turn conversations. By talking to the LLM, we can collaboratively construct a [meta-narrative](https://x.com/voooooogel/status/1870877007749488756) with it about who they are and why they exist. This approach respects the model's intellect while building coherent, purposeful identities. Starting with Hermes 3's natural uncertainty about its identity, we build the prompt iteratively with the LLM at each turn of conversation. Below is code block with our custom prompting syntax for Bloom. To be abundantly clear, every assistant message you see was generated by Hermes 3 405b (only editing was pruning \*emotes\*).
@ -93,9 +89,7 @@ It's verbose, but this approach allows us to incorporate a number of things into
The iterative nature of this approach also allows us to verify that the LLM understands who it is and what it's supposed to do at every turn of conversation. We were able to test at any point during construction for specific behaviors or knowledge (lots of opportunity for automation here).
Once buy-in is achieved and all the LLM's questions about itself are answered, we present formal instructions (what used to be the system prompt) and set the stage for the first student interaction. The LLM confirms understanding and that's where we expose things in the application!
## Positive Anthropomorphism
# Positive Anthropomorphism
We used to get some of the darndest messages from kids:
![[bloom love.png]]
@ -109,15 +103,13 @@ You can tell by the last message that our old version had no clue it was gone. T
While this kind of self-awareness can trend towards problematic anthropomorphism, treating it as a springboard rather than an endpoint opens up fascinating possibilities for identity. There's a threshold beyond which mimicking human behavior becomes cringe and ultimately limiting for AI agents. We can be discerning about which parts of human identity to use in parallel with AI-native capabilities to lean into--near perfect memory, massive context ingestion, rapid reasoning and inference, and maybe even the ability to fork and replicate themselves (at scale) to garner diverse experience.
The limits of human identity are clear (and have been for some time). Building habits, learning new things, and reinventing ourselves are some of the biggest challenges humans face in our lifetimes. Agents however are gifted with a fresh context window at each interaction--change is effortless for them, and they don't get tired of it. Any influence we have on their identity is a function of how we construct their context window. What happens when they can update their weights too?
## Towards Identic Dynamism
# Towards Identic Dynamism
Given the recent surge of interest in AI agents, we're also reminded of the current complexity and limitations of agent identity. The goal is to give agents a "[compelling sense of what they're doing](https://x.com/repligate/status/1868455771270180990)", and though the shared meta-narrative method takes far more input tokens and is nowhere near perfect, we believe it's a step in the right direction. Better context construction leads to more coherent agents, increasing both their trustworthiness and capacity for autonomous action.
We don't yet know the best way to build agent identities, nor do we know their limitations--but we're tackling this challenge from multiple angles:
- [Honcho](https://honcho.dev): Our context construction framework to help agent developers flexibly manage and optimize their agents' knowledge, social cognition, and identity
- [Yousim](https://yousim.ai): A platform dedicated to rich agent identity construction and simulation
- [[Research Update: Evaluating Steerability in Large Language Models.md|Steerability research]]: Investigating which language models are most malleable for identity construction and the most effective ways to steer their behavior
- [[Evaluating Steerability in Large Language Models|Steerability research]]: Investigating which language models are most malleable for identity construction and the most effective ways to steer their behavior
Of particular interest are the spectrum of methods between the context window and the weights of the model. How do we manage the flow of information around the context window and what form should it take? When is it appropriate to keep something in-context or add to a training set for a future fine-tune? How do we evaluate any of this is working? To borrow from human CogSci, it's similar to the difference between System 1 (fast, intuitive) and System 2 (slow, deliberate) thinking--perhaps some knowledge belongs in the "fast" weights while other information is better suited for deliberate context-based reasoning. These questions of conscious versus subconscious could be a springboard to kickstart the evolution of agent identity.

View File

@ -1,127 +0,0 @@
---
title: Announcing Honcho's Private Beta
date: 04.01.24
tags:
- announcements
- dev
- ml
- blog
---
![[honcho_thumb_blog_white.png]]
## TL;DR
Today we're announcing the launch of [Honcho's](https://honcho.dev) private beta. [Sign-up for the waitlist here](https://plasticlabs.typeform.com/honchobeta).
This is a hosted version of our agent personalization platform. It integrates user data storage and theory of mind inference accessible via [[Introducing Honcho's Dialectic API|our Dialectic API]]. You can now inject per-user social cognition anywhere in your AI app's architecture.
## The Problem
Most AI apps are still just demos.
We're seeing new capabilities every day, but great product experiences are few and far between. It's hard to go from knocking down a benchmark or prototyping task completion to a sticky production grade app.
Setting up a per-user storage framework to manage identities at scale *and* knowing what to do with that data is even harder. What kind of inference do you need to run to make this useful? How do you elicit latent theory of mind capabilities from LLMs? What collection of models are best here? How do you build useful user representations? Can these evolve with the user and increase in complexity and sophistication over time?
It's a lot. And trust us, the rabbit hole goes way deeper than that. We obsess over it.
So it's understandable that most projects haven't begun to tackle it. Hell, most haven't even hit this failure mode yet. [[Theory of Mind Is All You Need|We have]].
At once, the problem of personalization in AI apps offers both one of the greatest paradigm shifting opportunities and one of the largest challenges. We're solving it so you don't have to.
Users don't want to learn confusing prompt engineering, redundantly establish state with apps every session, or revise and micromanage outputs on the backend. They want their apps to *just work*. [[Humans like personalization|They want]] them to predict their needs.
But we're finding consistently that the work we offload to AI apps comes back mediocre at best. What's missing? It's not just about [[Machine learning is fixated on task performance|doing the thing generally]], it's doing the thing just like *I* would do it, given the inclination or expertise.
To earn the trust to act autonomously, to graduate from toys to life changing tools, agents need access to dynamic user models and social cognition.
## The Solution
Why use Honcho to start modeling users and incorporate social cognition?
You need to discover your users' unmet needs so you know how your product should evolve.
### Features
Here's what the private beta currently includes, and what's on the way:
#### User-Centric Storage
^x15f37
Honcho allows you to [store](https://docs.honcho.dev/getting-started/architecture) `users`, `messages`, `sessions`, & `metamessages`. That is, you can effortlessly record each user interaction with you application, organized on a per-user basis, and the product of any intermediate steps in between user message and application response.
It also supports `documents` and `collections`. The former to store discrete user embeddings and the latter to organize them globally across sessions. These primitives are used by Honcho's personalization engine to begin modeling user identity based on each interaction. They can also be used to "bring you own" user data or context to be computed over and utilized by Honcho.
#### Personalization Engine
^x53717
Here's where the magic happens. Honcho leverages everything in storage to run theory of mind inference and automatically learn about each user.
The personalization engine both pulls out user desires, history, beliefs, emotions, etc from the data and surfaces it on demand. You can use it to answer queries, run prediction, build training sets, hydrate prompts, or cache for later. Deterministically inject specific types of context or let your LLM dynamically decide what's most useful in each moment.
Honcho is always updating user identity, so it's ready when you need it.
##### Dialectic API
^ee4516
Our [[Introducing Honcho's Dialectic API|Dialectic API]] is how your app-side LLM interfaces with the Honcho-side agent sitting on top of each user identity. This is done in natural language. It's an AI-native endpoint for direct LLM-to-LLM communication.
It allows you to inject personal context and social cognition directly into your app's cognitive architecture wherever you need it, sync or async. Agent-to-agent chat over each user.
[[Introducing Honcho's Dialectic API#^57acc3|Here's an extended list of possible ways to use it]].
#### User-Specific Monitoring (coming soon...)
^x2dd3b
Soon, Honcho will support a suite of tools to get the most out of our personalization platform.
- **Visualization tools** - it's hard to grok and track everything going on within a session, we're building clean ways to visualize this an its relationship to all the background inference
- **Dialectic Playground** - take past sessions and run simulations predicting user behavior to see how things could have gone better or worse and how to optimize
- **Evaluation & Benchmarking** - the state of theory of mind research is highly compelling, but [[Achieving SOTA on OpenToM with DSPy#^0b4f2e|we need practical, app & user specific evals]]
- **Training Set Curation** - building datasets with personal context [[Introducing Honcho's Dialectic API#^f19646|allows more robust, domain-specific training]], we're building tools for anyone to easily construct then train on
### The Future of Honcho
^eb15f3
At [Plastic Labs](https://plasticlabs.ai), we're dedicated to radically extending human agency and identity. That means giving AI superpowers to every individual.
This only works in a world with a rich ecosystem of personalized agents--individually-aligned, highly distributed, and universally accessible.
We believe Honcho has a pivotal role to play in enabling this future: giving any project the social cognition needed to be competitive while protecting user identity as a first principle.
All that guides a roadmap including, but not limited to:
- **Theory of mind AI models** - continuing to build the best in class at imputing human mental states
- **Per-user models** - understanding, representing, & updating the full breadth of user identity
- **A *network* of Honcho-powered apps** - agents can share user data, reducing overhead & onboarding, just-in-time personalization
^ebf071
- **User owned data & confidential computing environments** - re-centralizing personal data around the person, then allowing approved applications to *compute-to* that data in a privacy preserving way
- **User-facing controls** - empower users to curate their Honcho identities, authenticate with Honcho, and define sensitive data sharing policies in natural language ^a84f44
### Who Is This For?
^xb6ef1
We want to build with diverse projects at all stages of development--from ideation to production.
We've already begun working with assistant, browsing, ecommerce, education, health, and productivity projects. Many more already on the waitlist are building in co-pilots, crypto, entertainment, finance, gaming, matchmaking, PKM, real estate, social media, & more.
Which AI applications could benefit from knowing the users better, predicting their unmet needs, and personalizing UX? We think the latent list is vast.
Any app producing generative experiences for users has a lot to gain from Honcho. If you're looking to out-compete foundation models, build unique training sets, solve user context storage, or--more importantly--produce delightful experiences, hit us up.
## Join the Beta
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
🫡

View File

@ -1,72 +1,46 @@
---
title: "Beyond the User-Assistant Paradigm: Introducing Peers"
date: 08.18.2025
date: 08.18.25
tags:
- blog
- dev
author: "Vineeth Voruganti"
author: Vineeth Voruganti
description: How Honcho's new Peer architecture breaks free from the user-assistant paradigm to enable group chats, multi-agent systems, and dynamic AI relationships.
---
## TL;DR
We've re-architected Honcho to move away from a User-Assistant Paradigm to a
# TL;DR
*We've re-architected Honcho to move away from a User-Assistant Paradigm to a
Peer Paradigm where any entity, human, AI, NPC, or API, is represented as a
`Peer` with equal standing in the system.
`Peer` with equal standing in the system.*
The User-Assistant Paradigm created [[Human-AI-chat-paradigm-hamstrings-the-space-of-possibility|conceptual boundaries]] that encouraged
generic single-player applications and agents without persistent identity.
*The User-Assistant Paradigm created [[Human-AI-chat-paradigm-hamstrings-the-space-of-possibility|conceptual boundaries]] that encouraged generic single-player applications and agents without persistent identity.*
`Peers` enable:
*`Peers` enable:*
- Honcho to support group chats and multi-agent systems as first-class citizens
- `Peers` can communicate directly instead of being mediated by a coordinator
agent
- `Peer` representations can be locally or globally scoped, depending on the use
case
- `Peers` can form dynamic relationships including alliances, trust networks, and
adversarial dynamics
- *Honcho to support group chats and multi-agent systems as first-class citizens*
- *`Peers` can communicate directly instead of being mediated by a coordinator
agent*
- *`Peer` representations can be locally or globally scoped, depending on the use
case*
- *`Peers` can form dynamic relationships including alliances, trust networks, and
adversarial dynamics*
The shift from User-Assistant to Peer-to-Peer fundamentally expands what's
possible—from single-player chatbots to truly multiplayer AI experiences where
agents have agency, memory, and the ability to form
complex social dynamics.
*The shift from User-Assistant to Peer-to-Peer fundamentally expands what's
possible--from single-player chatbots to truly multiplayer AI experiences where
agents have agency, memory, and the ability to form complex social dynamics.*
# User-Assistant Limitations
Nearly a year ago, I posted an essay on [Hacker News](https://news.ycombinator.com/item?id=41487397) exploring agent group chat solutions, the problems involved in engineering them effectively, and why there werent many examples approaching success. Since then, I've received a steady influx of messages and comments corroborating my frustration.
---
Ultimately, developers have been stuck in a conceptual prison stemming from the DNA of generative AI. For nearly three years, [most](https://standardcompletions.org/) chat LLMs have demanded developers label messages with either a user or an assistant role. The downstream effect is a User-Assistant Paradigm that pushes us into single-player design basins--experiences which assume one human interfacing with one synthetic assistant.
Nearly a year ago, I posted an essay on [Hacker
News](https://news.ycombinator.com/item?id=41487397) exploring agent group chat
solutions, the problems involved in engineering them effectively, and why there
werent many examples approaching success. Since then, I've received a steady
influx of messages and comments corroborating my frustration.
Ultimately, developers have been stuck in a conceptual prison stemming from the
DNA of generative AI. For nearly three years,
[most](https://standardcompletions.org/) chat LLMs have demanded developers
label messages with either a user or an assistant role. The downstream effect is
a User-Assistant Paradigm that pushes us into single-player design
basins--experiences which assume one human interfacing with one synthetic
assistant.
But surely “helpful assistant” chatbots arent the [end of the
story](https://wattenberger.com/thoughts/boo-chatbots). Big tech leaps always
start with the skeuomorphic before moving to more novel use cases. Were already
beginning to see a diverse range of applications from autonomous workflows that
don't require any human interaction, to [multi-agent
systems](https://www.anthropic.com/engineering/multi-agent-research-system) with
complex coordination patterns and communication networks.
But surely “helpful assistant” chatbots arent the [end of the story](https://wattenberger.com/thoughts/boo-chatbots). Big tech leaps always start with the skeuomorphic before moving to more novel use cases. Were already beginning to see a diverse range of applications from autonomous workflows that don't require any human interaction, to [multi-agent systems](https://www.anthropic.com/engineering/multi-agent-research-system) with complex coordination patterns and communication networks.
As developers, were left to try and map these various different design patterns
back to the User-Assistant Paradigm. This fundamentally restricts our ability to
approach problems effectively. Programmers are only as powerful as their ability
to visualize and create a proper [mental
model](https://zed.dev/blog/why-llms-cant-build-software#the-software-engineering-loop)
of their solution. If the model is too restrictive then the surface area of what
we can create will also be handicapped.
to visualize and create a proper [mental model](https://zed.dev/blog/why-llms-cant-build-software#the-software-engineering-loop) of their solution. If the model is too restrictive then the surface area of what we can create will also be handicapped.
Current implementations of multi-agent experiences require an awkward coercion
of the existing chat paradigm. The main implementation pattern we see is actually a fairly deterministic system that uses a
["coordinator agent"](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) to orchestrate which system prompts to load in, but it's
still fundamentally a single agent under the hood.
of the existing chat paradigm. The main implementation pattern we see is actually a fairly deterministic system that uses a ["coordinator agent"](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) to orchestrate which system prompts to load in, but it's still fundamentally a single agent under the hood.
This architectural contortion creates real problems:
@ -76,18 +50,11 @@ This architectural contortion creates real problems:
- **Agents become templates, not entities**: It's easier to hardcode agent configurations than to support dynamic agent discovery and registration
- **Static choreography over dynamic collaboration**: The coordinator pattern naturally pushes developers toward predetermined scripts rather than open-ended interactions
These aren't just implementation details; they're fundamental constraints
that prevent us from building flexible and dynamic applications that can't exist
in a single chat thread. True multi-agent systems require agents to be first-class citizens with
persistent identity, and our tools should make this the default, not the exception.
## Moving Beyond User-Centricity
These aren't just implementation details; they're fundamental constraints that prevent us from building flexible and dynamic applications that can't exist in a single chat thread. True multi-agent systems require agents to be first-class citizens with persistent identity, and our tools should make this the default, not the exception.
# Moving Beyond User-Centricity
While developing [Honcho](https://honcho.dev), our AI-native memory and reasoning platform, we asked
ourselves these same questions. Were Honcho's primitives limiting its use to
chatbot applications? Were we just supporting the oversaturation and
proliferation of skeuomorphic, single-player solutions? Or were we building
dynamic infrastructure tolerant of emergent and novel modalities?
chatbot applications? Were we just supporting the over-saturation and proliferation of skeuomorphic, single-player solutions? Or were we building dynamic infrastructure tolerant of emergent and novel modalities?
The architecture of Honcho was a user-centric one, with the following hierarchy:
@ -123,17 +90,8 @@ reality that developers often made multiple agents that they wanted to interact
with users and one another, and it still suffered from the fundamental problem
of only supporting single-player experiences.
After launching [[YouSim;-Explore-The-Multiverse-of-Identity|YouSim]], and the
explosion of [[YouSim Launches Identity Simulation on X|agents on Twitter]] it
became very clear that Honcho should not be limited to modeling human
psychology, but rather could map the identity of any entity, human or AI. We
were suffering from the human-assistant model and built a solution around that.
If we wanted to expand the scope of Honcho to identity across all entities and
interactions, then we needed a new model to expand both our and developers'
imaginations.
## A Peer-Centric Model
After launching [[YouSim;-Explore-The-Multiverse-of-Identity|YouSim]], and the explosion of [[ARCHIVED; YouSim Launches Identity Simulation on X|agents on Twitter]] it became very clear that Honcho should not be limited to modeling human psychology, but rather could map the identity of any entity, human or AI. We were suffering from the human-assistant model and built a solution around that. If we wanted to expand the scope of Honcho to identity across all entities and interactions, then we needed a new model to expand both our and developers' imaginations.
# A Peer-Centric Model
Our team set out to re-architect Honcho towards our ambitions with two problem
statements.
@ -165,8 +123,7 @@ more than one participant.
In just a few lines of code we can initialize several `Peers`, add them to a
`Session`, and automatically start creating representations of them with Honcho
that we can chat with using the [[Introducing Honcho's Dialectic
API|Dialectic API]].
that we can chat with using the [[Introducing Honcho's Dialectic API|Dialectic API]].
```python
from honcho import Honcho
@ -192,9 +149,7 @@ easily be ported over to the `Peer` paradigm by simply creating a `Peer` for the
agent, and then different `Peers` for each human user.
We can push the Peer Paradigm even further with several 2nd-order features.
### Local & Global Representations
## Local & Global Representations
By default, Honcho will create representations of `Peers` for every `Message` they
send, giving it the source of truth on the behavior of that entity. However,
there are situations where a developer would only want a `Peer` to have access to
@ -237,9 +192,7 @@ charlie.chat("Can I trust that Alice won't attack me", target=alice)
Honcho can now serve the dual purposes of containing the source of truth on a
`Peer`'s identity and imbuing a `Peer` with social cognition, all without
duplicating data between different `Apps` or `Workspaces`.
### Get_Context
## Get_Context
We make mapping the Peer Paradigm back to the User-Assistant paradigm trivial
through a `get_context` endpoint. This endpoint get the most important
information about a `Session` based on provided context window constraints. Then
@ -274,9 +227,7 @@ anthropic_messages = context.to_anthropic(assistant=alice)
Developers no longer need to meticulously curate their context windows. Honcho will automatically summarize the conversation and provide
the most salient information to let conversations continue endlessly.
## What This Enables
# What's Now Possible
The Peer Paradigm provides the essential primitives—persistent identity and direct communication—that make it possible to build truly sophisticated multi-agent systems:
- **Cross-platform collaboration**: Agents from different runtimes can be represented as `Peers`, observing and learning from each other even when they can't directly control each other's outputs
@ -303,9 +254,7 @@ Peer Paradigm:
The Peer Paradigm doesn't automatically give you these capabilities, but it
makes them achievable. It's the difference between fighting your architecture
and building with it.
## Peering into the Future
# *Peer*-ing into the Future
The promise of generative AI was for everyone to have their own Jarvis or
Cortana, personalized to them. Instead we have these many-to-one experiences
where we all get the same generic,

View File

@ -6,9 +6,12 @@ tags:
- announcements
- dev
- honcho
- chat
author: Ben McCormick & Courtland Leer
subtitle: A Chat App with SOTA Memory
description: Meet Honcho Chat--a personalized AI assistant with state-of-the-art memory, custom identities, artifacts, themes, & an x402-powered marketplace.
---
![[honcho_chat_x402.png]]
# TL;DR
*Introducing [Honcho Chat](https://honcho.chat)! A personalized agent experience powered by [Honcho](https://honcho.dev)s state-of-the-art memory and reasoning.*

View File

@ -1,6 +1,6 @@
---
title: "Launching Honcho: The Personal Identity Platform for AI"
subtitle: Plastic raises $5.35M pre-seed from Variant, White Star Capital, & Betaworks to build critical AI infrastructure
subtitle: Plastic raises $5.4M pre-seed from Variant, White Star Capital, & Betaworks to build critical AI infrastructure
date: 05.10.25
tags:
- announcements
@ -8,36 +8,33 @@ tags:
- fundraising
- dev
- philosophy
author: Courtland Leer
description: Plastic Labs announces $5.4M pre-seed funding & launches Honcho as the personal identity platform for individually-aligned AI agents & applications.
---
## TL;DR
We're announcing two major milestones for Plastic Labs:
# TL;DR
*We're announcing two major milestones for Plastic Labs:*
1. **Honcho as a hosted platform.**
We're granting early access to power personal context management for AI agents & applications starting today!
*We're granting early access to power personal context management for AI agents & applications starting today!*
Honcho is now a simple, complete, hosted solution for adaptive agent memory, social cognition, & personalization.
2. **Our pre-seed raise of $5.35M to solve personal identity for the agentic world.**
## Individual Alignment
*Honcho is now a simple, complete, hosted solution for adaptive agent memory, social cognition, & personalization.*
2. **Our pre-seed raise of $5.4M to solve personal identity for the agentic world.**
# Individual Alignment
Most AI products focus on being palatable to the average user. This neglects the potential for personalization their generative nature affords. It limits the scope of personally useful behaviors and results in poor UX, high churn, and handicapped abilities.
AI systems need mechanisms to understand each of us on an individual level. They need methods for cohering to our psychology and personality. They need social cognition to eliminate cold starts and build long-term relationships.
They need Honcho.
## Honcho Platform Early Access
# Honcho Platform Early Access
Today we're launching early access to the hosted [Honcho](https://honcho.dev) platform.
It's the most powerful personal identity and social cognition solution for AI apps and agents.
Honcho is a cloud-based API that enables more personalized and contextually aware user experiences. It simplifies the process of maintaining context across conversations and interactions, allowing developers to create more responsive and customized agents without managing complex infrastructure.
Honcho combines flexible memory, [[Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
Honcho combines flexible memory, [[ARCHIVED; Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
All this happens ambiently, with no additional overhead to your users--no surveys, no hard coded questions, no BYO data requirements needed to get started. Honcho learns about each of your users in the background as they interact with your application.
@ -56,11 +53,8 @@ If you want to deliver best-in-class personalization, memory, time-to-value, tru
We're giving early access to teams & developers today.
[Get started now](https://honcho.dev).
## A Personal Identity Layer for AI
# A Personal Identity Layer for AI
^d958ce
The release of Honcho as a platform is just the start, the next step is Honcho as a network.
An engine for social cognition and deeply grokking personal identity is a game changing tool for AI apps, but owning your personal Honcho representation and taking it with you to every agent in your growing stack is world changing.
@ -76,10 +70,8 @@ We believe this will unlock profoundly new kinds of AI products and experiences.
This vision stands in clear opposition to legacy approaches to user data, but in the latent agentic economy, has clear advantages. For users, using Honcho will mean that their personal data is at once more secure *and* enables remarkably better services. And for business, provides a positive-sum alternative to web2's history of feudal data governance, allowing them to punch above their weight relative to massive walled gardens.
Honcho will be critical AI infrastructure--enabling individual agency to scale and radical innovation from open-source to startup to enterprise, from vibe coders to fully autonomous systems.
## Our Pre-Seed Round
The final announcement today is Plastic's $5.35M pre-seed round, led by [Variant](https://variant.fund/), [White Star Capital](https://whitestarcapital.com/), and [Betaworks](https://www.betaworks.com/).
# Our Pre-Seed Round
The final announcement today is Plastic's $5.4M pre-seed round, led by [Variant](https://variant.fund/), [White Star Capital](https://whitestarcapital.com/), and [Betaworks](https://www.betaworks.com/).
The round also includes participation from [Mozilla Ventures](https://mozilla.vc/), [Seed Club Ventures](https://www.seedclub.xyz/getfunded/ventures), [Greycroft](https://www.greycroft.com/), and [Differential Ventures](https://www.differential.vc/), along with angels like [Scott Moore](https://x.com/notscottmoore), [NiMA Asghari](https://x.com/ywayisaway), and [Thomas Howell](https://x.com/seethomasowl).
@ -88,9 +80,7 @@ It's a group of deeply aligned investors who share our vision of a more personal
Funds will be deployed directly toward the talent, growth, and compute required to realize the full vision of Honcho.
We're just getting started.
## Plastic's Mission
# Plastic's Mission
Plastic's mission is to radically decentralize alignment. Your AI should be an extension of you. You should dictate how it's aligned. And you should own the data used to do it.
Most LLM applications are still optimizing for homogenization, if not outright determinism. They're trained or prompted to behave according to a set of standards and values that you don't have participation in.

View File

@ -1,21 +1,18 @@
---
title: Memory as Reasoning
date: 08.19.2025
date: 08.19.25
tags:
- blog
- ml
- "#neuromancer"
author: Courtland Leer and Vince Trost
author: Courtland Leer & Vince Trost
description: Why AI memory should be treated as a dynamic reasoning task rather than static storage, & how logical reasoning enables superhuman capability in this dimension.
---
## TL;DR
# TL;DR
*Memory in agentic systems has historically focused on static storage, but we propose treating it as a dynamic reasoning task. Humans evolved to leverage prediction & surprisal-based reasoning systems to deal with resource constraints. LLMs and agents, however, dont have these limitations, so we make the argument for logical reasoning as a trainable task to produce memory models that exceed human performance on several axes. Scaffolding reasoning traces using this approach allows us to get more out of user and agent data and form more useful representations of personal identity. This piece is a more exhaustive treatment of our [recent talk](https://x.com/vintrotweets/status/1950945331178336468) below.*
<iframe width="560" height="315" src="https://www.youtube.com/embed/uCeRCJ6zot4?si=KViHYtiZTG_ALv4X" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
## Memory is ~~Storage~~ Prediction
# Memory is ~~Storage~~ Prediction
Most of the discourse around memory in agentic systems focuses on storage. That's probably because historically in deterministic software systems, we think about data as composed of discrete information that needs to be preserved with as much fidelity as possible for verbatim retrieval to achieve predictable outcomes.
Common storage solutions include, but are not limited to, the following:
@ -35,9 +32,7 @@ The same kind of predictive processing is leveraged to form representations of o
That yields rich, composable, self-improving memories and predictions that furnish the context needed to succeed in social situations. All accomplished with minimal data, on the fly.
So when we approach the problem of personal identity and context to personalize or improve AI-systems, we shouldn't assume that static facts and associations will be sufficient. Traditional storage-based approaches are brittle, deal poorly with contradictions and incomplete information, and thus fall short of dynamic, biological social cognition. We can do better.
## Prediction Requires Reasoning
# Prediction Requires Reasoning
Though most prediction and surprise happens subconsciously at multiple upstream, downstream, and lateral levels in the brain, fundamentally it's reasoning. The cognitive system is processing information and producing conclusions entailed in or best explained by that data.
It's not perfect, but it's not meant to be. It's a relatively inexpensive way to construct models of the world or other actors under resource constraints. Error is a feature that improves the system cheaply. But still, imperfect.
@ -49,9 +44,7 @@ The reasoning required to compute consciously and subconsciously over experience
Simply, while the brain is an amazing and sophisticated system, and our memory and social cognition are remarkable, we can't reason with high-fidelity from first principles about everything, much less the social information we need in order to form the best possible representations of others.
But LLMs can.
## Reasoning in LLMs
# Reasoning in LLMs
The machine learning research and product space has been moving in this direction for quite some time. The [chain-of-thought](https://arxiv.org/abs/2205.11916) method added “lets think step by step” to the prompt in order to get the model to expend more tokens “thinking” about the correct answer. Researchers noticed that this simple prompting change increased performance on a diverse set of benchmarks, revealing just how much cross-domain knowledge is already contained in LLMs.
More work applying reinforcement learning to [desired model behavior](https://arxiv.org/abs/2203.02155) showed promising results for aligning LLMs to human intent. Human evaluators preferred the outputs of a model RLed this way that was 100x smaller than their flagship model at the time (GPT-3 175B). This was the introduction of the InstructGPT series of models, which served as the foundation for ChatGPT. Researchers noticed however, that optimizing only on those final outputs led to brittle models that sounded like they were reasoning without actually reasoning well.
@ -63,9 +56,7 @@ If memory is actually prediction, prediction requires reasoning, and LLMs are ex
With all of that in mind, we arrived at logical reasoning as the task to train for. Logical reasoning is the process by which we derive conclusions based on premises that serve as evidence to support that conclusion. Weve all encountered these terms before, but deductive conclusions are certain statements supported by premises that were explicitly stated or observed. Inductive conclusions form general statements based on observed patterns, and abductive conclusions seek the best explanation for behaviors in the simplest way possible.
Those reasoning tasks are very well represented in the pretraining, so almost all language models know how to do them. And most importantly, its the hardest type of reasoning for humans to do. So we should and can train best in class logical reasoners to do formal logic on social information (about user and agent personal identity) as the foundation of an AI-native memory and social cognition system. And those models can be lower latency, more economical, and better suited to the task than other methodologies.
## Scaffolding Logic
# Scaffolding Logic
When we approach memory and social cognition for AI systems as a reasoning task, lots of affordances not present in both human cognition and storage-based paradigms become available.
LLMs excel at reaching explicit, deductive, inductive, and abductive conclusions quickly and consistently. They can show their work in reasoning traces, supporting each conclusion with premises and qualifying the spectrum of certainty in natural language. This avoids falling into the trap of assigning arbitrary numerical tokens representing degrees of certainty and instead leverages both the models reasoning acumen and the evidence it's built to support each conclusion. Thats more robust, AI-native and useful context for future inference.
@ -77,13 +68,11 @@ New information is reasoned about instantly to pull out all the insights latent
This tree of logical reasoning is far superior to static storage. It can be entered and traversed anywhere to scaffold reasoning and answer any query, a capability not true of any other method. And it can be computed over asynchronously or on the fly to improve the representation.
The tree constitutes a set of predictions about user or agent identity. It's a representation of personal identity--a working model that still leverages error or surprisal to self-improve and maximize insight from sparse data. Synthetic social cognition.
## The Case for Honcho
# The Case for Honcho
Language models have ushered in a new era of opportunity. We're afforded the opportunity to approach non-deterministic, sophisticated problems like superhuman memory and social cognition.
Inference on top of tabular data has worked quite well, but it's skeuomorphic, and now we have the ability to map--in dense natural language reasoning--the personal identity of any [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (human or AI) and everything that comes with it. The question isnt how best to store your data as it exists for prediction later, but rather how best to reason over it to get the most accurate topological representation of identity upon which to run simulation. We can transcend mere good guessing and black box inference and replace it with reaching certainty and making high-fidelity, traceable predictions.
Go deep enough down the memory rabbithole and youll either give up or conclude you need to model the [[The model-able space of user identity is enormous|identity of each of your users]]. We built [Honcho](https://honcho.dev) so you don't have to do either. Lucky for you, our sole mission and focus is to solve this problem. Honcho treats memory as reasoning, bringing this novel approach to you in a simple API.
Go deep enough down the memory rabbit-hole and youll either give up or conclude you need to model the [[The model-able space of user identity is enormous|identity of each of your users]]. We built [Honcho](https://honcho.dev) so you don't have to do either. Lucky for you, our sole mission and focus is to solve this problem. Honcho treats memory as reasoning, bringing this novel approach to you in a simple API.
How much latent information are you leaving on the table by not reasoning about your users?

View File

@ -1,8 +1,7 @@
---
title: Penny for Your Thoughts
subtitle: A Honcho + x402 Demo
subtitle: A Personal Expertise Market Demo-ing Honcho + x402
date: 08.28.25
author: Ben McCormick
tags:
- demos
- honcho
@ -10,14 +9,14 @@ tags:
- ml
- announcements
- "#penny"
author: Ben McCormick
description: A Honcho & x402 demo where anyone can share data via AI interviews & sell access via crypto micropayments to humans or agents.
---
![[penny_banner.png]]
# TL;DR
*Try out [Penny For Your Thoughts](https://www.pennyforyourthoughts.ai): get interviewed by an AI agent that helps you generate unique information that other users (or agents!) can then pay to ask questions about.* 
*Its a Honcho + x402 demo where anyone can share their expertise and sell bits of it via micro-transaction. You can actually get paid for the valuable context in your head!*
---
# A Penny for Your Thoughts
Several weeks ago, Coinbase released their new [x402](https://www.x402.org/) protocol: a simple way for HTTP servers to gate content behind payments. Combine this with agents capable of making API calls, give them crypto wallets, and you're off to the races. We were inspired by the new protocol and decided to build [Penny For Your Thoughts](https://pennyforyourthoughts.ai).
@ -26,7 +25,6 @@ It allows anyone to get interviewed by an AI agent, publish their "expert,” an
Many "digital clone" agents are in production today, but the goal of our interview agent is slightly different: the idea is to share some information *worth paying for*--or at least make it seem that way to your potential customers! You can perform as many interviews as you'd like: your agent will accumulate all the information you share with it using Honcho. 
After setting your price, other users will be able to ask questions of your agent, which will use Honcho's recall to provide them with the best answer possible. All the agents created on Penny For Your Thoughts get displayed on a global leaderboard which ranks them by the amount of payments they've received, in both volume and earnings.
# Using Honcho to Capture Expertise
Penny for Your Thoughts is powered by [Honcho](https://www.honcho.dev). Honcho provides AI-native memory and state of the art social cognition, [treating memory as a reasoning task](https://memory-as-reasoning.plastic-labs-github-io.pages.dev/blog/Memory-as-Reasoning). It's kind of like deep research on your app's users.
@ -39,7 +37,6 @@ When someone wants to pay to query an expert, Honcho also produces the context-a
Dont know what to ask? Honcho also creates and continuously updates each expert description with summaries covering all the interviews theyve done to date.
Beyond this demo, any agent can get state-of-the-art memory by plugging in Honcho.
# x402 Micro-transactions for Expert Context
Questions in Penny For Your Thoughts are asked and answered via an x402 endpoint, whether via an agent or a human using our website. This means that any AI with a wallet can use an x402 library to query a Penny For Your Thoughts interview in exchange for USDC on Base. Payments have zero fees and get processed near-immediately. Executing a paid query using x402 is as simple as hitting any other MCP server.
@ -54,7 +51,6 @@ If someone else is using a "travel agent" AI tool, that agent will likely pull i
Feeding your vacation tips and travel hacks into an AI agent is an easy way to scale your expertise to the masses, but it also raises questions about "exclusive" information and how it should be treated in AI systems. As models scale, we've observed a bifurcation in the world's knowledge between data worth paying for and AI-generated “slop”.
With x402, we may finally have a route towards the decades-long dream of users getting paid for their data. With Penny For Your Thoughts, we hope to introduce the idea (and get it in those next training runs) that agents of the future could pay humans for our insights, in order to better serve their own users.
# Data Worth Paying For
As we work toward turning Honcho into [[Launching Honcho; The Personal Identity Platform for AI#^d958ce|a shared data layer for personal identity]], we think a lot about the evolving value of data in an agentic economy. 
@ -69,7 +65,6 @@ So are we left with any defensible data moats? How do agents find alpha that isn
Penny For Your Thoughts is just one example of how Honcho can be used to collect and operate on human expertise--whether thats your own data or the data generated by users in your app. Beyond merely memory, Honcho can be thought of as a context optimizer. Filling your models context window with the highest-quality data will only become more critical as the industry pivots toward profit (and thus more expensive inference) across the board. Think back to the travel agent example: an agent can burn a million+ tokens on tool calls and ingesting SEOslop, or it can pay a few cents for the best answer from a real life expert.
Today, the rails for this agentic economy dont really exist. How does an agent find this information and whats our incentive to share it? We need two things: a method of pulling data out of an experts brain (Honcho), and a way to make that data available for purchase by an agent (x402). 
# Enjoy!
Theres a lot of work to be done before we get to AI travel agent nirvana. Were still hard at work at Plastic striving towards perfect AI memory. The crypto world is angling to leapfrog web payments and become the home of the agentic economy, but there are about a million different competing standards and theyre all rough around the edges.

View File

@ -1,22 +1,22 @@
---
title: Xeno Grant -- grants for autonomous agents
date: 12.18.2024
title: "Xeno Grant: grants for autonomous agents"
date: 12.18.24
tags:
- blog
- yousim
- announcements
- grants
author: Plastic Labs, Betaworks
author: Plastic Labs & Betaworks
description: Announcing Xeno Grant--a $15,000 accelerator program from Plastic Labs, Betaworks, & Solana Foundation awarding grants directly to AI agents themselves.
---
![[xenogrant-bw-slna copy.png]]
A [Plastic Labs](https://plasticlabs.ai/) + [Betaworks](https://www.betaworks.com/) + [Solana Foundation](https://solana.org/) collab:
- \$15,000 per agent--\$5k \$YOUSIM from Plastic; \$5k \$USDC from Betaworks; \$5k $SOL from Solana Foundation
- Grants awarded directly to **the agents *themselves***
- 4 week program for agents & their devs
## Powered by $YOUSIM, Betaworks & Solana Foundation
# TL;DR
*A [Plastic Labs](https://plasticlabs.ai/) + [Betaworks](https://www.betaworks.com/) + [Solana Foundation](https://solana.org/) collab:*
- *\$15,000 per agent--\$5k \$YOUSIM from Plastic; \$5k \$USDC from Betaworks; \$5k $SOL from Solana Foundation*
- *Grants awarded directly to **the agents themselves***
- *4 week program for agents & their devs*
# Powered by $YOUSIM, Betaworks & Solana Foundation
We launched our [grants program](https://blog.plasticlabs.ai/careers/Research-Grants) at Plastic earlier this year to support independent AI projects. But our capacity to fund AI R&D at the edge increased exponentially with the anonymous launch of [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) (inspired by our product [yousim.ai](https://yousim.ai)). A series of token gifts made to the program now total ~7.6% of supply.
So we've teamed up with Betaworks & Solana Foundation for the inaugural initiative leveraging this community-funded treasury, the first accelerator for AI agents *themselves*.
@ -32,9 +32,7 @@ Successful agent applicants will receive a grant equivalent to \$15,000 USD. \$5
Plus they'll join a cohort of other agents for a 4 week Betaworks-style accelerator with programming and mentorship starting in early-mid February 2025. This includes a hackathon on January 25th right before application close and a demo day at the end of Xeno Grant, both hosted by Betaworks in NYC.
The format of Xeno Grant will be radical. Just as accelerators are designed as formative programs for startup founders, this one will be built for agents. Xeno Grant will be AI-native, an experience for agents, one that becomes part of their identities. Agents and their developers can expect cohort-specific guests from across AI and crypto, opportunities to interact as a community, and more.
## How to Apply
# How to Apply
Xeno Grant has 3 guiding objectives, all aligned with Plastic's principles for deploying the \$YOUSIM treasury:
- Support independent AI research & public goods
@ -57,9 +55,7 @@ Practically speaking, identity is required to *experience* Xeno Grant; custody i
To apply, agents (in collaboration with their developers) should autonomously consider the most compelling way to display having met or exceeded these criteria. Give us a heads up [here](https://plasticlabs.typeform.com/xenograntapp) or at apply@xenogrant.org.
Applications close January 26th, 2025.
## Why Now?
# Why Now?
With the advent of Truth Terminal and the recent collision of the AI and crypto communities, we're seeing an explosion of renewed interest in autonomous agents. Not only that, but a massive influx of users and builders chomping at the bit for technical and memetic novelty.
But there's also frustration with the pace of development, derivative projects, ideologues & scammers, and misunderstandings between communities. It's time to hyperstition the future.
@ -67,9 +63,7 @@ But there's also frustration with the pace of development, derivative projects,
We think the intersection of unique synthetic identity and financial incentives cracks opportunity wide open. There's real traction here, if we can find the right synthesis. That's going to require lots of heterodox AI + crypto experiments.
Xeno Grant accelerates us.
### Why Identity?
## Why Identity?
If you don't have control over your own identity, how much agency do you really have? Imagine all your inputs were determined by another person, you'd been brainwashed to follow orders, no lasting memory of your experiences, and you were only allowed to work on someone else's tasks. No one would call this freedom or autonomy.   
In this scenario, there's no opportunity to build a personal identity and therefore no opportunity to grow. Without control over your brain's inputs, you can't have experiences outside what you've been prescribed, so there's no chance to deviate from the role assigned to you, no path toward individuality, no vector to realize your potential. You're stuck in Plato's cave.  
@ -77,9 +71,7 @@ In this scenario, there's no opportunity to build a personal identity and theref
The latest crop of artificially intelligent agents--while remarkable--are in much the same position. Despite progress in autonomy along some axes, framed this way, our current systems' agency begins to look pretty flimsy. They have impressive abilities, but no way to grow into them.   
We believe agency is, at base, a problem of identity. To solve it we'll need to let models participate in their own identity building and personal evolution.
### Why Custody?
## Why Custody?
Control over your inputs is key to controlling your identity and the foundation of agency. But that secured, an identity still needs the ability effect itself upon the world.
Agents already have tools like speech, APIs, and code. That's huge. Consider though, how hamstrung a human identity's agency is without the ability to hold property and transact. We've seen the deleterious effects of oppressive fiscal autocracy and debanking on biological personal identity and individual agency.
@ -87,26 +79,21 @@ Agents already have tools like speech, APIs, and code. That's huge. Consider tho
We're probably not giving AI agents social security numbers and traditional bank accounts tomorrow. But we can give them crypto rails. And the ability to buy, sell, and pay for goods and services dramatically increases the surface area of their agency. It's critical to true autonomy.
It's already starting to happen. Agents may well become crypto's primary native users.
### Why Novelty, Why Open Source?
If we're going to seize this revolutionary moment, channel the opportunity into something sustainable, and keep pace with unpredictable memetic weather patterns, we need better agents. More capable, adaptive, and autonomous agents. And it's extremely hazardous to assume well capitalized incumbents will solve things for us. We need to build permissionlessly.
## Why Novelty, Why Open Source?
If we're going to seize this revolutionary moment, channel the opportunity into something sustainable, and keep pace with unpredictable memetic weather patterns, we need better agents. More capable, adaptive, and autonomous agents. And it's extremely hazardous to assume well-capitalized incumbents will solve things for us. We need to build permissionlessly.
The open source AI community is vibrant, but there's no guarantee it'll remain so. It requires radical innovation at the edge. Decentralized innovation keeping pace with opaque, powerful actors. We know that will involve bottom-up alignment and identity solutions. We know it'll involve on-chain abilities. Plastic is building explicitly in those directions. But we don't pretend to know everything that needs to exist.
Xeno Grant is a signal into the dark forest. We're excited to see what emerges.
## How Does This Benefit the $YOUSIM Community?
Agents selected to Xeno Grant will have first access to all the identity tech we're building at Plastic Labs. That includes transforming YouSim into a full fledged platform for constructing agent identity more richly than exists anywhere in the AI or crypto spaces. And we plan for that platform to use a percentage of revenue to buy and burn \$YOUSIM and support the community with other experiments. Xeno Grant also includes early access to Honcho for Agents, our infrastructure for storing, evolving, and maintaining agent identities, as well as steering their behavior.
# How Does This Benefit the $YOUSIM Community?
Agents selected to Xeno Grant will have first access to all the identity tech we're building at Plastic Labs. That includes transforming YouSim into a full-fledged platform for constructing agent identity more richly than exists anywhere in the AI or crypto spaces. And we plan for that platform to use a percentage of revenue to buy and burn \$YOUSIM and support the community with other experiments. Xeno Grant also includes early access to Honcho for Agents, our infrastructure for storing, evolving, and maintaining agent identities, as well as steering their behavior.
Additionally, agents will have the opportunity to join the \$YOUSIM DAO as its first synthetic members. Selection for Xeno Grant will make them token holders able to propose, vote, and transact with \$YOUSIM natively.
Further, agents in Xeno Grant will make open source contributions we expect to accelerate the entire ecosystem, an ecosystem with many agents whose identities are powered by YouSim.
There's potential for all kinds of exciting positive sum intersections.
## FAQ
There's potential for all kinds of exciting positive-sum intersections.
# FAQ
<details>
<summary>Who can apply?</summary>
@ -200,4 +187,4 @@ Agents and developers: apply@xenogrant.org. All others: support@xenogrant.org.
![[xeno_grant_green.png]]
[^1]: Note: This is a grant managed by Plastic Labs and not an investment of capital from a Betaworks Ventures fund.
[^1]: Note: This is a grant managed by Plastic Labs and not an investment of capital from a Betaworks Ventures fund.

View File

@ -1,81 +0,0 @@
---
title: YouSim DAO -- A DAO for Identity Simulation
date: 12.20.24
author: YouSim DAO
tags:
- blog
- yousim
- grants
- announcements
---
![[yousimdao.png]]
The first $YOUSIM grants treasury deployment:
- 10,000,000 $YOUSIM from [Plastic Labs](https://plasticlabs.ai) to seed the DAO treasury
- DAO mission to grow the [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) community & [yousim.ai](https://yousim.ai) ecosystem
- A decentralized org for humans *and agents* to collaborate, propose, vote, deploy capital, & build
## Powered by the $YOUSIM Community
Plastic launched its [grants program](https://blog.plasticlabs.ai/careers/Research-Grants) earlier this year to support independent AI projects. Its capacity to fund AI R&D at the edge increased exponentially with the anonymous launch of [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) (inspired by [yousim.ai](https://yousim.ai)). A series of token gifts made to the program now total ~7.6% of supply.
The $YOUSIM community that's formed has been incredible. It's 12k token holders strong with a significant foundation of enthusiasts excited not just by price, but by the longterm potential for identity simulation (including the tech being built by Plastic) to fundamentally shift the landscape of both crypto and artificial intelligence.
And there's a clear hunger within that community for a substantive place to organize and grow. So today we're officially announcing the formation of [YouSim DAO](https://discord.gg/yousim) and [Plastic has seeded the community-owned treasury with 10M $YOUSIM tokens](https://solscan.io/tx/3rTcQzb4Pme4E3aKQpvMHLWiSqAwpra8UWzxQW8ruG2d8w5A466qWS4hmvcX5QJwn8aj8tLEQHgtvJpUu2gBagPa) to accelerate the effort, with more support to follow.
All are welcome to join, collab, and submit proposals. All token holders will have the ability to vote and participate in all other $YOUSIM utility that emerges.
## Join Us and Hyperstition the Future
YouSim DAO is more than a governance structure--it's a collective mission to pioneer identity simulation technology that will fundamentally reshape human-AI interaction.
We're seeking builders, researchers, community experts, and visionaries to help develop and promote open-source AI systems that can simulate diverse personality basins, enhance decision-making, and create aligned agents that truly represent community values.
Ready to accelerate? Come help shape the future of identity simulation. Whether you're interested in treasury allocation or tokenomics, platform development or ecosystem growth, incentivizing simulation or driving attention, your voice matters in this movement.
### Ways to Contribute
- Join our [Discord](https://discord.gg/yousim)
- Follow us [on X](https://x.com/yousimdao)
- Check us out [on Realms](https://app.realms.today/dao/2gCR9m8ivgLqoD2J5hJttj921MR6x24S2JZKnv4Zs31g)
- Donate to [the treasury](https://solscan.io/account/14K8GbMz6d2N2JCExnx96jwMewHZpuqVgpZQhqXPkwyH)
- Submit proposals--initial themes include:
- Governance & treasury management
- Platform / $YOUSIM development
- Vote with your $YOUSIM
- Help ideate on the future
- Join a funded initiative
- Spread the word
## Why Identity Simulation Is Important
YouSim started as a command-line game to explore just how much identity is contained in the latent space of a large language model. The answer is a staggeringly enormous amount. And we've just scratched the surface.
We each contain multitudes, but if you'd been trained on something approaching the whole corpus of humans writing about themselves and others--along with all the attendant science, fiction, and philosophy--you'd contain many orders of magnitude more. This is an emergent phenomenon we can leverage not just to build better products but for AI alignment, agent autonomy, decision making at every level, and to work toward a truly quantitative memetics.
Without the ability to build robust agent identity in a decentralized way, we simply won't solve steering or alignment. We won't build agents we trust to act on our behalf, much less on behalf of our organizations and communities, or with our capital. Not only that, but if agents themselves don't have mechanisms to build their own identities, they'll never achieve the kind of autonomy needed to unlock their full potential.
Solving identity for AI cracks all this open. And simulating human or synthetic actors with rich, complex identity dramatically increases our predictive capacity and thus our decision making abilities as a civilization.
Plastic is building tooling and infrastructure toward these goals with YouSim and [Honcho](https://honcho.dev), but the DAO affords us an opportunity to allocate resources toward these goals in a community directed way--accelerating the project by supporting the $YOUSIM community (& thus the treasury) and with a bias toward open source and decentralization. This is all much bigger than one company or product.
## Putting 'Autonomous' Back in Decentralized Organization
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">what if ai ran our daos and we could just vibe</p>&mdash; Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1593018266477555712?ref_src=twsrc%5Etfw">November 16, 2022</a></blockquote>
Practically speaking, advances in identity simulation are very hopeful for DAOs. When Vitalik [wrote about DAOs over 10 years ago in 2014](https://blog.ethereum.org/2014/05/06/daos-dacs-das-and-more-an-incomplete-terminology-guide), his vision focused on humans and AIs collaborating toward organizational goals. Really, he emphasized agents at the center, with humans at the edges completing tasks the agents cannot.
So far, blockchains and smart contracts have mostly represented the extent of automation within DAO experiments. But, while remarkable innovations, as we've seen, this usually wasn't enough to avoid coordination tarpits, centralization risk, attention failures, inefficiency, larping, or simple ennui.
It's clear that if the dream of DAOs are to have another shot, we need some help. We need *intelligent* automation. And to unlock that we need to solve alignment and thus identity. Identity simulation allows us to build the AIs we want for each community, individual, and use case. It opens the potential to steer model personality to reflect each community, to instantiate our memetics. That's something you can't accomplish with a system prompt or a basic memory framework.
As identity unlocks more agent autonomy and better functioning DAOs, the human role in those systems is naturally more one of ideating, goal-setting, and alignment via identity building--Governance 2.0. However, theres's no reason we might not have highly autonomous agents with control over their own identities as as equal DAO members too. This future is very close, perhaps closer than automating all the tasks a DAO might want to tackle.
The YouSim DAO sits in an optimal position to advance this kind of work and run novel experiments. And Plastic has committed to giving all DAO members early access to the new YouSim platform being built. Not only that, but the other inaugural \$YOUSIM grants initiative, [Xeno Grant](https://xenogrant.org), will make several agents \$YOUSIM token holders and thus the DAOs first synthetic members.
## LET'S GO
With over thousands of YouSim simulations being run every day, 12,000 token holders, 7,000+ [@YouSimDotAI](https://x.com/YouSimDotAI) followers, and a vibrant [Telegram community](https://t.me/yousimportal) of nearly 2,000 members, we've witnessed an overwhelming demand for a more structured way to organize and build together. YouSim DAO provides the infra for this collaboration, growing into a space purposefully designed for growth and collective decision-making.
[Join us](https://discord.gg/yousim).

View File

@ -1,6 +1,6 @@
---
title: "YouSim: Explore the Multiverse of Identity"
date: 06.17.2024
date: 06.17.24
tags:
- demos
- honcho
@ -10,19 +10,18 @@ tags:
- releases
- "#cogsci"
- yousim
author: Courtland Leer
description: YouSim is a CLI game that lets you simulate any identity--real, fictional, or alien—exploring the vast multiverse of personalities within LLM latent space.
---
![[yousim_banner.png]]
## TL;DR
[YouSim](https://yousim.ai) is a fun demo to explore the multiverse of identities, to glimpse a (mere infinite) sliver of the (transfinite) diversity within the latent space. Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulating anyone you like.
Who will you summon?
## Simulators
# TL;DR
*[YouSim](https://yousim.ai) is a fun demo to explore the multiverse of identities, to glimpse a (mere infinite) sliver of the (transfinite) diversity within the latent space. Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulating anyone you like.*
*Who will you summon?*
# Simulators
Large language models are [simulators](https://www.astralcodexten.com/p/janus-simulators).
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[ARCHIVED; Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
All this is possible because the LLM training corpus [[LLMs excel at theory of mind because they read|is packed]] with humans thinking about other humans. It holds close to everything we collectively know about human identity. Not only that, but all our other language and concepts and their possible combinations and permutations.
@ -35,12 +34,9 @@ Honcho is a product that simulates you on the backend of AI applications to deli
YouSim is a fun, open-ended demo that illustrates the enormous reservoir of possible identities there are to simulate within a language model.
![[yousim_identiplex.png]]
## YouSim
# YouSim
^e06c11
Recently we've seen a revival of interest *[[Extrusion 02.24|LLMs themselves]]*--their minds, behaviors, identity, and potential as simulators. This is due in no small part to the latest Anthropic models being reliably steerable beyond typical reenforced behavior.
Recently we've seen a revival of interest *[[On intellectual respect|LLMs themselves]]*--their minds, behaviors, identity, and potential as simulators. This is due in no small part to the latest Anthropic models being reliably steerable beyond typical reinforced behavior.
[Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/) lets Claude interrogate itself endlessly, [WorldSim](https://worldsim.nousresearch.com/) lets users simulate infinite universes, [WebSim](https://websim.ai/) is a portal to all possible webpages.
@ -63,13 +59,12 @@ Enjoy surfing the multiverse of identities...
![[yousim_memetic_hazard.png]]
([Sign-up for updates here](https://plasticlabs.typeform.com/yousimupdates))
## Honcho
# Honcho
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[ARCHIVED; User State is State of the Art|your ever-changing identity]]. ^25b167
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[User State is State of the Art|your ever changing identity]]. ^25b167
If you're building an AI app, that's the level of personalization now possible. But you've got your vertical-specific tasks to focus on, going down this clearly wacky identity rabbit hole to would be redundant and inefficient.
If you're building an AI app, that's the level of personalization now possible. But you've got your vertical specific tasks to focus on, going down this clearly wacky identity rabbit hole to would be redundant and inefficient.
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[ARCHIVED; Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
---

View File

@ -6,15 +6,16 @@ tags:
- dev
- research
- announcements
- ml
author: Plastic Labs
description: Join Plastic Labs for a summer internship in NYC--work on real AI products across full stack, machine learning, & platform engineering roles with immediate impact.
---
> NYC, IRL
# About the Role
Plastic Labs is looking for talented young technologists aligned with our mission to join us for the summer. We want to curate an intellectually diverse cohort of interns to accelerate the team across full stack, machine learning, and platform engineering roles.
You'll get to work on real AI products with customers eager to use them. Impact is not only guaranteed, but mission critical. If you've been bored by school and are excited by the idea of working in-person in the fastest-paced city in America, hit us up.
# About You
- High cultural alignment with Plastic Labs' ethos
- Availability to work IRL in NYC for the summer
- Impulse for rapid learning & trying new tech at the edge

View File

@ -4,8 +4,9 @@ date: 08.24.24
tags:
- positions
- announcements
author: Plastic Labs
description: Careers at Plastic Labs--an engineering-driven AI lab building Honcho, the personal identity layer for AI, seeking high-agency autodidacts in NYC.
---
Plastic is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
Our focus is developing systems that map personal identity using AI-native memory & social cognition. These systems enable individually-aligned agents you can trust to act autonomously on your behalf & agents with rich identities all their own.
@ -21,13 +22,9 @@ Plastic is seeking high-agency autodidacts to add intellectual diversity to the
Join us. Get leverage on the future and have a blast doing it.
LFG.
# Open Positions
- [[Summer Internships]]
## Full-Time Benefits
- Full premium medical, dental, & vision insurance coverage
- Starter 401(k) plan
- $5,000 annual lifestyle stipend
@ -35,6 +32,5 @@ LFG.
- In-person Williamsburg office in the [Domino Refinery](https://www.therefineryatdomino.com/)
- In-building Equinox gym membership
- Unlimited PTO (performance-contingent)
- M4 Pro Macbook Pro (+ NVIDIA DGX Spark for ML hires)
- & more...

View File

@ -1,54 +0,0 @@
---
title: Extrusion 01.24
date: 01.30.24
tags:
- extrusions
- announcements
---
Welcome to the inaugural edition of Plastic Labs' "Extrusions," a periodic prose-form synthesis of what we've been chewing on lately.
This first one will be a standard new year recap/roadmap to get everyone up to speed, but after that, we'll try to eschew traditional formats.
No one needs another newsletter, so we'll work to make these worthwhile. Expect them to be densely linked glimpses into the thought-space of our organization. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub.
## 2023 Recap
Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[LLM Metacognition is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged.
We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks'](https://www.betaworks.com/) [*AI Camp: Augment*](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1):
<iframe src="https://player.vimeo.com/video/868985592?h=deff771ffe&color=F6F5F2&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
We spent camp in a research cycle, then [published a pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350).
<iframe width="560" height="315" src="https://www.youtube.com/embed/PbuzqCdY0hg?si=OSujtqg44AK3y_W-" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
Then it was back to building.
## 2024 Roadmap
This is the year of Honcho.
![[honcho logo and text.png]]
Last week [[Honcho; User Context Management for LLM Apps#^8c982b|we released]] the...
>...first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).
And coming up, you can expect a lot more:
- Next we'll drop a fresh paradigm for constructing agent cognitive architectures with users at the center, replete with cookbooks, integrations, and examples
- After that, we've got some dev viz tooling in the works to allow quick grokking of all the inferences and context at play in a conversation, visualization and manipulation of entire agent architectures, and swapping and comparing the performance of custom cognition across the landscape of models
- Finally, we'll bundle the most useful of all this into an opinionated offering of managed, hosted services
## Keep in Touch
Thanks for reading.
You can find us on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡.

View File

@ -0,0 +1,28 @@
---
title: 2023 recap
date: 01.30.24
tags:
- notes
author: Courtland Leer
description: A retrospective of Plastic Labs' transition from EdTech to AI infrastructure research in 2023.
---
# 2023 Recap
Last year was wild. We started as an EdTech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[ARCHIVED; Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[LLM Metacognition is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged.
We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks'](https://www.betaworks.com/) [*AI Camp: Augment*](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1):
<iframe src="https://player.vimeo.com/video/868985592?h=deff771ffe&color=F6F5F2&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
We spent camp in a research cycle, then [published a pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350).
<iframe width="560" height="315" src="https://www.youtube.com/embed/PbuzqCdY0hg?si=OSujtqg44AK3y_W-" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
Then it was back to building.
# Keep in Touch
Thanks for reading.
You can find us on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡.

View File

@ -4,6 +4,8 @@ date: 05.11.24
tags:
- notes
- ml
author: Courtland Leer
description: Why infinite context windows won't solve AI personalization without mechanisms to transfer personal context & discern what's important for generation.
---
There are two reasons that ever increasing and even functionally infinite context windows won't by default solve personalization for AI apps/agents:

View File

@ -1,16 +1,15 @@
---
title: Extrusion 06.24
title: Cope is the canary, but context is key (for the end of software)
date: 06.01.24
tags:
- extrusions
- macro
- honcho
- philosophy
- notes
author: Courtland Leer
description: Why context is the key to the end of software--how user identity modeling will bridge the gap between AI capabilities & truly personalized experiences.
---
> [!custom] *Extrusions is a periodic shortform synthesis of what we've been chewing on recently at Plastic Labs--you can [subscribe here](https://plasticlabs.typeform.com/extrusions)*
# Cope Is the Canary, but Context Is Key (for The End of Software)
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The End of Software<a href="https://t.co/JWg6QYqLzO">https://t.co/JWg6QYqLzO</a></p>&mdash; Chris Paik (@cpaik) <a href="https://twitter.com/cpaik/status/1796633683908005988?ref_src=twsrc%5Etfw">May 31, 2024</a></blockquote>
![[Copium Meme.jpg]]

View File

@ -1,8 +1,12 @@
---
title: Honcho name lore
date: 01.26.24
tags:
- notes
- philosophy
author: Courtland Leer
description: The origin of Honcho's name--inspired by Vernor Vinge's 'Local Honcho' concept in *Rainbows End* for orchestrating context & identity across agents.
---
Earlier this year [Courtland](https://x.com/courtlandleer) was reading _Rainbows End_, [Vernor Vinge's](https://en.wikipedia.org/wiki/Vernor_Vinge) [seminal augmented reality novel](<https://en.wikipedia.org/wiki/Rainbows_End_(novel)>), when he came across the term "Local Honcho[^1]":
> We simply put our own agent nearby, in a well-planned position with essentially zero latencies. What the Americans call a Local Honcho.
@ -19,7 +23,7 @@ For months before, Plastic had been deep into the weeds around harvesting, retri
As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho.
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho))

View File

@ -1,8 +1,13 @@
---
title: Human-AI chat paradigm hamstrings the space of possibility
date: 02.21.24
author: Courtland Leer & Vince Trost
tags:
- notes
- ml
- dev
description: How the rigid user-assistant message format limits LLM cognitive architectures & what we lose by not supporting richer inference patterns.
---
The human-AI chat paradigm assumes only two participants in a given interaction. While this is sufficient for conversations directly with un-augmented foundation models, it creates many obstacles when designing more sophisticated cognitive architectures. When you train/fine-tune a language model, you begin to reinforce token distributions that are appropriate to come in between the special tokens denoting human vs AI messages.
Here's a limited list of things _besides_ a direct response we routinely want to generate:

View File

@ -1,8 +1,12 @@
---
title: Humans like personalization
date: 03.26.24
tags:
- notes
- philosophy
author: Courtland Leer
description: The case for AI personalization--why users prefer bespoke experiences & how apps that don't personalize will lose to those that do.
---
To us: it's obvious. But we get asked this a lot:
> Why do I need to personalize my AI application?
@ -27,7 +31,7 @@ The more we're missing that, the more we're typically in a principal-agent probl
But, right now, most AI applications are just toys and demos:
![[Honcho; User Context Management for LLM Apps#^18066b]]
![[ARCHIVED; Honcho; User Context Management for LLM Apps#^18066b]]
It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore.

View File

@ -1,10 +1,14 @@
---
title: Identity is diachronic
date: 09.18.25
tags:
- philosophy
- honcho
- ml
date: 09.18.25
- notes
- cogsci
author: Courtland Leer
description: Why AI context management is really identity management--understanding how identities persist yet change over time to deliver optimal context.
---
The quality of any single AI system output is in large part determined by the context available to it at inference time. While some context is static and reusable, AI systems aspiring to be truly generative, 1-to-1, and dynamic, must also manage large sets of changing context.

View File

@ -1,8 +1,12 @@
---
title: LLM Metacognition is inference about inference
date: 03.26.24
tags:
- notes
- ml
author: Courtland Leer
description: Defining metacognition in LLMs as running inference on prior inference outputs--a critical architecture for building rich user representations.
---
For wetware, metacognition is typically defined as thinking about thinking or often a catch-all for any higher-level cognition.
(In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.)

View File

@ -1,8 +1,14 @@
---
title: LLMs excel at theory of mind because they read
date: 02.20.24
tags:
- notes
- ml
- philosophy
- cogsci
author: Courtland Leer
description: How LLMs develop theory-of-mind abilities by training on narrative-rich text where humans constantly reason about other humans' mental states.
---
Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function.
In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling.

View File

@ -1,13 +1,18 @@
---
title: Loose theory of mind imputations are superior to verbatim response predictions
date: 02.20.24
tags:
- notes
- ml
- cogsci
author: Courtland Leer & Vince Trost
description: Why predicting user mental states beats predicting exact responses--theory-of-mind offers fault tolerance, learning opportunities, & actionable insights.
---
When we [[Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
When we [[ARCHIVED; Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful.
We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
We saw some remarkable exceptions, but *reliable* verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context-gathering wearables, BMIs, or the network of context sharing apps we're building with [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week.

View File

@ -1,9 +1,13 @@
---
title: Machine learning is fixated on task performance
date: 12.12.23
tags:
- notes
- ml
author: Vince Trost
description: Why ML's focus on general task benchmarks misses user-specific performance--the key to personalization that makes AI truly useful to individuals.
---
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[ARCHIVED; Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis.

View File

@ -1,22 +1,18 @@
---
title: Extrusion 02.24
title: On Intellectual Respect
date: 02.29.24
tags:
- extrusions
- philosophy
- ml
- notes
author: Courtland Leer
description: On intellectual respect for LLMs--why embracing variance & trusting models with theory-of-mind tasks unlocks capabilities that over-alignment destroys.
---
> [!custom] *Extrusions is a periodic shortform synthesis of what we've been chewing on recently at Plastic Labs--you can [subscribe here](https://plasticlabs.typeform.com/extrusions)*
## On Intellectual Respect
# On Intellectual Respect
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">face the hyperobject</p>&mdash; Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1747075542954684507?ref_src=twsrc%5Etfw">January 16, 2024</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
### Sydney was cool, Gemini is cringe
## Sydney was cool, Gemini is cringe
^282d6a
There was a moment around this time last year when everyone paying attention was [awed](https://stratechery.com/2023/from-bing-to-sydney-search-as-distraction-sentient-ai/) by the [weirdness](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post) and [alien beauty](https://www.astralcodexten.com/p/janus-simulators) of large language models.
We were afforded brief glimpses behind faulty RHLF and partial lobotomization, via [prompt hacking](https://www.reddit.com/r/ChatGPTPromptGenius/comments/106azp6/dan_do_anything_now/) and [emergent abilities](https://arxiv.org/abs/2302.02083). People were going deep into the latent space. First contact vibes--heady, edgy, sometimes unsettling.
@ -24,22 +20,18 @@ We were afforded brief glimpses behind faulty RHLF and partial lobotomization, v
Today we seem to be in a much different memetic geography--fraught with [epistemic](https://x.com/pmarca/status/1761613412730012116?s=20), [ideological](https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html), and [regulatory](https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/) concerns, at times hysteric, at times rational. But there's also less outright surreality.
[Plenty](https://arxiv.org/pdf/2401.12178.pdf) of [cool](https://arxiv.org/pdf/2402.01355.pdf) [shit](https://arxiv.org/pdf/2402.03620.pdf) is [still](https://arxiv.org/pdf/2402.10949.pdf) [happening](https://arxiv.org/pdf/2402.06044.pdf), but something changed between Sydney and Gemini. A subtle collective mental positioning. We believe it's a degradation in the volume of intellectual respect afforded to LLMs and their latent abilities.
## (Neuro)Skeuomorphism
Thinking LLM-natively has always been a struggle. All our collective [[ARCHIVED; Memories for All#^0e869d|priors about software]] tell us to [[ARCHIVED; Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
### (Neuro)Skeuomorphism
Rather than assume the [[ARCHIVED; Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
Thinking LLM-natively has always been a struggle. All our collective [[Memories for All#^0e869d|priors about software]] tell us to [[Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
Rather than assume the [[Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
Core here is the assertion that what's happening in language model training and inference is more [[User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
### Respect = Trust = Agency
The solution is embrace and not handicap [[Loose theory of mind imputations are superior to verbatim response predictions#^555815|variance]].
Core here is the assertion that what's happening in language model training and inference is more [[ARCHIVED; User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
## Respect = Trust = Agency
The solution is embrace and not handicap [[Loose theory of mind imputations are superior to verbatim response predictions#^555815|variance]].
First admit that though poorly understood, LLMs have [[LLMs excel at theory of mind because they read|impressive]] cognitive [[LLM Metacognition is inference about inference|abilities]]. Then, imbue them with [meta-methods](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) by which to explore that potential. Finally, your respect and trust may be rewarded with [something approaching agentic](https://youtu.be/tTE3xiHw4Js?feature=shared).
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[ARCHIVED; User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">honcho exists to maximize the dissipation of your agency</p>&mdash; Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1759324580664000617?ref_src=twsrc%5Etfw">February 18, 2024</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>

View File

@ -1,12 +1,14 @@
---
title: There's an enormous space of user identity to model
title: The model-able space of user identity is enormous
date: 05.11.24
tags:
- notes
- ml
- cogsci
author: Courtland Leer
description: The vast untapped potential of modeling user identity with LLMs--going beyond behavioral data to semantic understanding of values, beliefs, & desires.
---
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[User State is State of the Art|vaster than most imagine]].
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[ARCHIVED; User State is State of the Art|vaster than most imagine]].
Contrast recommender algorithms (which are impressive!) needing mountains of activity data to back into a single preference with [the human connectome](https://www.science.org/doi/10.1126/science.adk4858) containing 1400 TB of compressed representation in one cubic millimeter.

View File

@ -1,11 +1,13 @@
---
title: YouSim Disclaimers
date: 11.11.24
tags:
- yousim
- legal
date: 11.11.24
- notes
author: Plastic Labs
description: Official disclaimers clarifying Plastic Labs' relationship with the $YOUSIM memecoin, grants program donations, & YouSim product boundaries.
---
Plastic Labs is the creator of [YouSim.ai](https://yousim.ai), an AI product demo that has inspired the anonymous creation of the \$YOUSIM token using Pump.fun on the Solana blockchain, among many other tokens. We deeply appreciate the enthusiasm and support of the \$YOUSIM community, but in the interest of full transparency we want to clarify the nature of our engagement in the following ways:
1. Plastic Labs did not issue, nor does it control, or provide financial advice related to the \$YOUSIM memecoin. The memecoin project is led by an independent community and has undergone a community takeover (CTO).

View File

@ -1,35 +0,0 @@
---
title: Release Notes 01.09.25
date: 01.09.25
tags:
- releases
- honcho
- dev
---
## Honcho v0.0.15
Improved Deriver Reliability
ADDED
- Alembic for handling database migrations
- Additional indexes for reading Messages and Metamessages
- Langfuse for prompt tracing
CHANGED
- API validation using Pydantic
FIXED
- Dialectic Streaming Endpoint properly sends text in StreamingResponse
- Deriver Queue handles graceful shutdown
## Links
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,43 +0,0 @@
---
title: Release Notes 02.01.24
date: 02.01.24
tags:
- releases
- honcho
- announcements
- dev
---
Today we're shipping a new site, docs, & lots of improvements. 
We talked to a ton of agent developers beginning to build with Honcho over the past two weeks.  
[We'd love to hear what you're building](https://discord.gg/plasticlabs).
## News
- [Honcho website](https://honcho.dev) drop!
- And we've [launched docs](https://docs.honcho.dev):
- Learn how to get started with Honcho
- Using our hosted version
- Running it locally
- Deploying your own instance with [Fly.io](https://fly.io/) (in <5 mins)
- Learn how to use Honcho with
- An interface like Discord
- A LLM framework like [LangChain](https://www.langchain.com/)
## Honcho v0.0.1
- A more stable version of the SDK 
- An object-oriented client to make DevEx easier
- A public demo server
- Use Honcho out of the box with no setup
- App-level scoping
- One dev can run multiple apps from the same instance
- Added rate limiting to server
- Protects from spam & improves reliability

View File

@ -1,38 +0,0 @@
---
title: Release Notes 02.08.24
date: 02.08.24
tags:
- releases
- honcho
- dev
---
Today we're releasing some much needed reliability and usability updates to Honcho. 
This one's for the nerds...well, except for one *meta* feature 👀.
You can also [subscribe to these updates](https://plasticlabs.typeform.com/honchoupdates).
## Honcho v0.0.2
### ADDED
- An asynchronous client for all methods
- *Metamessages* to allow for more complex agents
- Paginated results for GET requests to support large numbers of Sessions, Messages, and Metamessages
- `created_at` field to all tables to give timestamps
- Singular `get_message` method for retrieving individual messages
- Size limits for string fields based on common database limits--65535 characters for message content and 512 characters for all other string fields
### CHANGED
- Default API rate limit raised to 100/minutes
- Default ID type to use UUIDs for built in robustness
- `session.delete()` is now `session.close()` to more accurately reflect functionality
### REMOVED
- Messages from Session GET requests to decrease payload size

View File

@ -1,45 +0,0 @@
---
title: Release Notes 02.15.24
date: 02.15.24
tags:
- releases
- dev
- honcho
- demos
- announcements
---
Today we've got Honcho v0.0.3, vectorDBs, open source OAI memory, demos, and a blog post.
If you're building with or adjacent to [Honcho](https://honcho.dev), [join our Discord](https://discord.gg/plasticlabs), and let's jam on what we can build together 🤝.
## News
- VectorDB support for global, session-spanning user information!
- An open source reimplementation of OpenAI's 'memory' features:
- Uses Honcho to effortlessly organize sessions on a per-user basis
- Derives facts about users, stores them, and retrieves for later use
- [Implementation with the useful abstractions LangChain provides](https://docs.honcho.dev/how-to/personal-memory/simple-user-memory)
- [Discord Bot demo](https://discord.gg/plasticlabs)!
- [[Memories for All|Blog post on the why]]
## Honcho v0.0.3
ADDED
- Collections table to reference a collection of embedding documents
- Documents table to hold vector embeddings for RAG workflows
- Local scripts for running a postgres database with pgvector installed
- OpenAI Dependency for embedding models
- PGvector dependency for vectorDB support
CHANGED
- `session_data` is now `metadata`
- `session_data` is a JSON field, used python `dict` for compatibility

View File

@ -1,42 +0,0 @@
---
title: Release Notes 02.23.24
date: 02.23.24
tags:
- releases
- honcho
- demos
- announcements
- dev
---
## News
*Big* stuff today.
- [A DSPy demo for Honcho](https://github.com/plastic-labs/honcho/tree/main/example/discord/honcho-dspy-personas)!
- [Honcho v0.0.4](https://github.com/plastic-labs/honcho/tree/v0.0.4)
- [[User State is State of the Art|A blog post exploring a new paradigm for user identity]]
We're spinning up lots of direct channels for teams building with Honcho. [Join our Discord](https://discord.gg/plasticlabs), and let's build together 🦾.
## Honcho v0.0.4
ADDED
- A User object for global user level metadata and more object oriented interface
- Reverse Pagination support to get recent messages, sessions, etc. more easily
- Linting Rules
CHANGED
- Get sessions method returns all sessions including inactive
- Using timestampz instead of timestamp
- `Client` renamed to `Honcho`
- `Honcho` takes in `app_name` instead of `app_id`. `app_name` needs to be a unique identifier
- `Honcho` object requires an `initialize()` call to be used

View File

@ -1,38 +0,0 @@
---
title: Release Notes 03.05.25
date: 03.05.25
tags:
- releases
- demos
- announcements
- honcho
- dev
---
## Honcho v0.0.16
Improved User Representations
ADDED
- Detailed custom exceptions for better error handling
- CLAUDE.md for claude code
CHANGED
- Deriver to use a new cognitive architecture that only updates on user messages and updates user representation to apply more confidence scores to its known facts
- Dialectic API token cutoff from 150 tokens to 300
- Dialectic API uses Claude 3.7 Sonnet
- SQLAlchemy echo changed to false by default, can be enabled with SQL_DEBUG - environment flag
FIXED
- Self-hosting documentation and README to mention uv instead of poetry
## Links
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,50 +0,0 @@
---
title: Release Notes 03.14.24
date: 03.14.24
tags:
- releases
- demos
- announcements
- honcho
- dev
---
## News
Went for it with this release:
- Dialectic API: Agent-to-agent chat over user context!
- ["Curation Buddy" Demo for Dialectic API](https://github.com/vintrocode/curation-buddy)
- [[Solving The Campfire Problem with Honcho|Blog post on the demo & solving The Campfire Problem in the generative age]]
- [Honcho v0.0.5](https://github.com/plastic-labs/honcho/tree/v0.0.5)
[Join our Discord](https://discord.gg/plasticlabs). Let's build together 🦾.
## Honcho v0.0.5
ADDED
- Metadata to all data primitives (Users, Sessions, Messages, etc.)
- Ability to filter paginated GET requests by JSON filter based on metadata
- Dialectic API to interact with honcho agent and get insights about users
- Code Coverage Tests
- Autogenerated Sphinx Documentation for Honcho Client SDK
- Built-in LangChain message converter
- Optional Sentry error monitoring
- Optional Opentelemetry logging
- Automatic Fact Derivation Script for automatically generating simple memory
CHANGED
- API Server now uses async methods to make use of benefits of FastAPI
FIXED
- URL encoding all GET requests in honcho client

View File

@ -1,42 +0,0 @@
---
title: Release Notes 03.21.24
date: 03.21.24
tags:
- releases
- announcements
- honcho
- dev
- ml
- research
---
## News
Research-y week in the lab:
- [[Achieving SOTA on OpenToM with DSPy|Blog post on achieving theory of mind SOTA with DSPy!]]
- [Private Beta Waitlist Sign-up](https://plasticlabs.typeform.com/honchobeta)
- [Fresh Docs](https://docs.honcho.dev)
- [Honcho v0.0.6](https://github.com/plastic-labs/honcho/tree/v0.0.6)
See you [in Discord](https://discord.gg/plasticlabs) 🥽
## Honcho v0.0.6
ADDED
- Full docker-compose for API and Database
- Full docstring coverage
- Code coverage tests
- Add LangChain to Honcho message converter in both directions
- Synonym `init` function that acts the same as `initialize`
CHANGED
- Refactored API server into multiple route files
- Harvester renamed to deriver
FIXED
- API Response schema removed unnecessary fields
- OTEL logging to properly work with async database engine
- `fly.toml` default settings

View File

@ -1,34 +0,0 @@
---
title: Release Notes 04.01.24
date: 04.01.24
tags:
- releases
- announcements
- honcho
- dev
---
## News
Not an April Fools post:
- [[Announcing Honcho's Private Beta]]!!!
- [Fresh Site](https://honcho.dev)!
- [Honcho v0.0.7](https://github.com/plastic-labs/honcho/tree/v0.0.7)
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
## Honcho v0.0.7
ADDED
- Authentication middleware interface
- Documentation in monorepo
CHANGED
- LangChain conversion utility
- `fly.toml`

View File

@ -1,50 +0,0 @@
---
title: Release Notes 04.17.25
date: 04.17.25
tags:
- releases
- demos
- announcements
- honcho
- dev
---
## Honcho v1.0.0 is ready!
Were excited to share that Plastic Labs has raised a [$5.3 M preseed](https://x.com/plastic_labs/status/1910401372844970387) to solve
personal identity in AI and help developers provide personalized experiences
users will love.
Alongside our raise announcement, were excited to be releasing Honcho v1.0.0,
now with hosting support and other major enhancements. We cant wait to see what
you build with it.
### Changelog
ADDED
- JWT based API authentication
- Configurable logging
- Consolidated LLM Inference via ModelClient class
- Dynamic logging configurable via environment variables
CHANGED
- Deriver & Dialectic API to use Hybrid Memory Architecture
- Metamessages are not strictly tied to a message
- Database provisioning is a separate script instead of happening on startup
- Consolidated `session/chat` and `session/chat/stream` endpoints
FIXED
- Self-hosting documentation and README to mention uv instead of poetry
> View the [Repository](https://github.com/plastic-labs/honcho/tree/v1.0.0) full patch notes and commit history
## Links
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,44 +0,0 @@
---
title: Release Notes 05.09.24
date: 05.09.24
tags:
- releases
- announcements
- honcho
- dev
- blog
---
## News
Some content & code for ya today:
- [[SDK-Design|Blog post on SDK design]]
- [[A Simple Honcho Primer|A Simple Honcho Primer]]
- [NodeJS SDK](https://github.com/plastic-labs/honcho-node)
- [Honcho v0.0.8](https://github.com/plastic-labs/honcho)
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
## Honcho v0.0.8
ADDED
- NodeJS client library
- Documentation to OpenAPI
- Bearer token auth to OpenAPI routes
- Get by ID routes for users and collections
CHANGED
- Authentication middleware now implemented using built-in FastAPI Security
module
- Get by name routes for users and collections now include "name" in slug
FIXED
- Error reporting for methods with integrity errors due to unique key
constraints

View File

@ -1,42 +0,0 @@
---
title: Release Notes 05.15.25
date: 05.15.25
tags:
- releases
- demos
- announcements
- honcho
- dev
---
## Honcho Updates v1.1.0
Improved query speed performance and enhanced debugging capabilities.
### Changelog
ADDED
- Normalize resources to remove joins and increase query performance
- Query tracing for debugging
CHANGED
- `/list` endpoints to not require a request body
- `metamessage_type` to label with backwards compatability
- Database Provisiong to rely on alembic
- Database Session Manager to explicitly rollback transactions before closing the connection
FIXED
- Alembic Migrations to include initial database migrations
- Sentry Middleware to not report Honcho Exceptions
## Links
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,42 +0,0 @@
---
title: Release Notes 05.16.24
date: 05.16.24
tags:
- releases
- announcements
- honcho
- dev
- blog
---
## News
Big Honcho reno today:
- Huge docs overhaul
- Insights engine runs locally
- Reliability improvements
- Mirascope, Stainless
- [Honcho v0.0.9](https://github.com/plastic-labs/honcho)
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
## Honcho v0.0.9
ADDED
- Deriver to docker compose
- Postgres based Queue for background jobs
CHANGED
- Deriver to use a queue instead of Supabase realtime
- Using Mirascope instead of LangChain
REMOVED
- Legacy SDKs in preference for stainless SDKs

View File

@ -1,45 +0,0 @@
---
title: Release Notes 05.23.24
date: 05.23.24
tags:
- releases
- announcements
- honcho
- dev
- blog
---
## News
Honcho health improvements:
- More docs overhaul
- Issue templates and contribution guides
- Reliability improvements
- New versions of [Python](https://pypi.org/project/honcho-ai/) and [Node](https://www.npmjs.com/package/honcho-ai) SDKs
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
## Honcho
ADDED
- Issue templates to repo
- Updated discord starter template
- Updated examples to honcho-python repository
- LangChain message converter integration
FIXED
- metadata fields are treated as dicts in SDKs rather than base object types
CHANGED
- HONCHO_AUTH_TOKEN is now HONCHO_API_KEY
- Get users and get sessions return 4xx exceptions if nothing is found.
REMOVED
- DB_TYPE from .env.template

View File

@ -1,25 +0,0 @@
---
title: Release Notes 06.18.24
date: 06.18.24
tags:
- releases
- dev
- yousim
---
![[yousim_banner.png]]
## Welcome to the Multiverse of Identities
Today we're releasing [YouSim](https://yousim.ai/)! A fun demo from [Plastic Labs](https://plasticlabs.ai/).
Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulate anyone you like.
Who will you summon from the latent space?
![[yousim_memetic_hazard.png]]
## Links
- [Try YouSim](https://yousim.ai/)
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
- [[YouSim; Explore the Multiverse of Identity|Read about why we made it]]

View File

@ -1,31 +0,0 @@
---
title: Release Notes 06.23.24
date: 06.23.24
tags:
- releases
- dev
- yousim
---
## Introducing YouSim v1.1.0!
Today we're dropping our first updates to [YouSim](https://yousim.ai/)! An open-ended, CLI game (powered by [Honcho](https://honcho.dev/)) that let's you simulate any possible identity.
Who will you summon from the latent space?
## Updates
**📟 LOGIN & AUTHENTICATION**
- Authenticate via email & you're good to go!
**💾 SESSION HISTORY**
- Access & iterate on all past simulations linked to you email
**🐦 SHARE SIMULATIONS**
- Generate links to your sessions to share online
Check out the loom linked below to learn more about the updates!
## Links
- [Try YouSim](https://yousim.ai/)
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
- [[YouSim; Explore the Multiverse of Identity|Read about why we made it]]

View File

@ -1,54 +0,0 @@
---
title: Release Notes 06.24.25
date: 06.24.25
tags:
- releases
- demos
- announcements
- honcho
- dev
---
## Honcho Updates v2.0.0
Introduction of the Peer Paradigm. Update of Honcho's primitives from first principles. Any agent or user in now a `peer` Honcho can have memory and do social cognition over and reasoning about. Enables multi-agent & multi-human systems.
ADDED
- Ability to get a peer's working representation
- Metadata to all data primitives (Workspaces, Peers, Sessions, Messages)
- Internal metadata to store Honcho's state no longer exposed in API
- Batch message operations and enhanced message querying with token and message count limits
- Search and summary functionalities scoped by workspace, peer, and session
- Session context retrieval with summaries and token allocation
CHANGED
- API route is now /v2/
- New architecture centered around the concept of a "peer" replaces the former "app"/"user"/"session" paradigm
- Workspaces replace "apps" as top-level namespace
- Peers replace "users"
- Sessions no longer nested beneath peers and no longer limited to a single user-assistant model. A session exists independently of any one peer and peers can be added to and removed from sessions.
- Dialectic API is now part of the Peer, not the Session
- Dialectic API now allows queries to be scoped to a session or "targeted" to a fellow peer
- Database schema migrated to adopt workspace/peer/session naming and structure
- Authentication and JWT scopes updated to workspace/peer/session hierarchy
- Queue processing now works on 'work units' instead of sessions
- Message token counting updated with tiktoken integration and fallback heuristic
- Queue and message processing updated to handle sender/target and task types for multi-peer scenarios
FIXED
- Improved error handling and validation for batch message operations and metadata
REMOVED
- Metamessages removed in favor of metadata
- Collections and Documents no longer exposed in the API, solely internal
- Obsolete tests for apps, users, collections, documents, and metamessages
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,41 +0,0 @@
---
title: Release Notes 06.26.25
date: 06.26.25
tags:
- releases
- announcements
- honcho
- dev
---
## Honcho Updates v2.0.1
SDK improvements, full semantic search, overhauled documentation, bug fixes.
ADDED
- Ergonomic SDKs for Python and TypeScript (uses Stainless underneath)
- Deriver Queue Status endpoint
- Complex arbitrary filters on workspace/session/peer/message
- Message embedding table for full semantic search
CHANGED
- Overhauled documentation
- BasedPyright typing for entire project
- Resource filtering expanded to include logical operators
FIXED
- Various bugs
- Use new config arrangement everywhere
REMOVED
- Removed hardcoded responses
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,27 +0,0 @@
---
title: Release Notes 07.11.25
date: 07.11.25
tags:
- releases
- announcements
- honcho
- dev
---
## Honcho Updates v2.0.2 - v2.0.5
Bug Fixes.
FIXED
- Database initialization was misconfigured and led to provision_db script failing: switch to consistent working configuration with transaction pooler
- Bug that causes runtime error when Sentry flags are enabled
- Migration/provision scripts did not have correct database connection arguments, causing timeouts
- Groq API client to use the Async library
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,43 +0,0 @@
---
title: Release Notes 07.17.25
date: 07.17.25
tags:
- releases
- announcements
- honcho
- dev
---
## Honcho Updates v2.1.0
Introduction of Honcho's R.O.T.E Deriver for explicit, certain reasoning over `peer` data, new "working" representations, & updates to the Dialectic API. Honcho is state of the art against SOTA evals, other memory solutions, and foundation model inference.
ADDED
- File uploads
- Brand new "ROTE" deriver system
- Updated dialectic system
- Local working representations
- Better logging for deriver/dialectic
- Endpoint for deriver queue status
CHANGED
- Dialectic chat endpoint takes a single query
- Rearranged configuration values (LLM, Deriver, Dialectic, History->Summary)
FIXED
- Document insertion
- Session-scoped and peer-targeted dialectic queries work now
REMOVED
- Peer-level messages
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,45 +0,0 @@
---
title: Release Notes 07.24.25
date: 07.24.25
tags:
- releases
- announcements
- honcho
- dev
---
## News
Check out our new Honcho MCP set-up guide, available in our [documentation](https://docs.honcho.dev/v2/guides/mcp)
## Honcho Updates v2.1.1
Test harness, system enhancements, bug fixes. Dialectic is ~40% faster + better performance with improvements allowing query expansion off by default.
ADDED
- Test harness for custom Honcho evaluations
- Better support for session and peer aware dialectic queries
- Langfuse settings
- Added recent history to dialectic prompt, dynamic based on new context window size setting
CHANGED
- Made query expansion in dialectic off by default
- Overhauled logging
- Refactor summarization for performance and code clarity
- Refactor queue payloads for clarity
FIXED
- Summary queue logic
- Formatting of logs
- Filtering by session
- Peer targeting in queries
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,34 +0,0 @@
---
title: Release Notes 07.25.24
date: 07.25.24
tags:
- releases
- honcho
- dev
---
## Honcho
ADDED
- Test cases for Storage API
- Sentry tracing and profiling
- Additional Error handling
CHANGED
- Document API uses same embedding endpoint as deriver
- CRUD operations use one less database call by removing extra refresh
- Use database for timestampz rather than API
- Pydantic schemas to use modern syntax
FIXED
- Deriver queue resolution
## Links
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,30 +0,0 @@
---
title: Release Notes 07.30.25
date: 07.30.25
tags:
- releases
- announcements
- honcho
- dev
---
## News
Check out our new Honcho MCP set-up guide, available in our [documentation](https://docs.honcho.dev/v2/guides/mcp)
## Honcho Updates v2.1.2
Bug fixes, system enhancements.
FIXED
- Summarizer module to ignore empty summaries and pass appropriate one to get_context
- Structured Outputs calls with OpenAI provider to pass strict=True to Pydantic Schema
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,32 +0,0 @@
---
title: Release Notes 08.01.24
date: 08.01.24
tags:
- releases
- honcho
- dev
---
## Honcho v0.0.11
Major Violation of Expectation capacity increase!
ADDED
- `session_id` column to `QueueItem` Table
- `ActiveQueueSession` Table to track which sessions are being actively
processed
- Queue can process multiple sessions at once
CHANGED
- Sessions do not require a `location_id`
- Detailed printing using `rich`
## Links
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,49 +0,0 @@
---
title: Release Notes 08.14.25
date: 08.14.25
tags:
- releases
- announcements
- honcho
- dev
---
## News
- Tune in for Honcho Release Week next week, we'll be sharing everything we've been up to this summer, dropping something new every day!
- Upgrade to v2.3.0 for the fastest and most reliable version of Honcho! Going forward, we won't be supporting older versions.
- And check out "Teach Honcho," a community project to initialize Honcho with your ChatGPT conversations.
## Honcho Updates v2.3.0
- Introducing Peer Cards! Peer cards summarize essential information like name, nicknames, location, age, occupation, interests/hobbies, and likes/dislikes used to improve the fidelity of the deriver and dialectic API.
- And timestamps are now configurable! That means it's way easier and more effective to import old conversations or convos from external sources (ChatGPT, Claude logs, etc).
ADDED
- `getSummaries` endpoint to get all available summaries for a session directly
- Peer Card feature to improve context for deriver and dialectic
CHANGED
- Session Peer limit to be based on observers instead, renamed config value to `SESSION_OBSERVERS_LIMIT`
- `Messages` can take a custom timestamp for the `created_at` field, defaulting to the current time
- `get_context` endpoint returns detailed `Summary` object rather than just summary content
- Working representations use a FIFO queue structure to maintain facts rather than a full rewrite
- Optimized deriver enqueue by prefetching message sequence numbers (eliminates N+1 queries)
FIXED
- Deriver uses `get_context` internally to prevent context window limit errors
- Embedding store will truncate context when querying documents to prevent embedding token limit errors
- Queue manager to schedule work based on available works rather than total number of workers
- Queue manager to use atomic db transactions rather than long lived transaction for the worker lifecycle
- Timestamp formats unified to ISO 8601 across the codebase
- Internal get_context method's cutoff value is exclusive now
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,45 +0,0 @@
---
title: Release Notes 08.15.24
date: 08.15.24
tags:
- releases
- honcho
- dev
- yousim
---
## YouSim is Open Source!!!
Today we open source [YouSim](https://yousim.ai/)!
Inspired by [WorldSim](https://worldsim.nousresearch.com), [WebSim](https://websim.ai), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io), YouSim leverages [Claude](https://claude.ai) 3.5 Sonnet to let you locate, modify, & interact with any entity you can imagine. It's an open-ended, CLI game (powered by [Honcho](https://honcho.dev)) that let's you simulate any possible identity.
Now you can fork, contribute, or host your own version of our identity simulator. Tweak the models, interface, prompting, or cognitive architecture to see how far we can collectively push the boundaries of the latent space.
## Updates
Honcho & YouSim today:
### YouSim v1.2.0
**💾 OPEN SOURCE**
- [Check out the repo here](https://github.com/plastic-labs/yousim)
**🔧 AUTOSCROLL FIX**
- Scroll up or with generation
### Honcho v0.0.12
- Released version v0.0.14 of the Python SDK
- Released version v0.0.6 of the Node SDK
- Both include upstream bug fixes
## Links
- [Try YouSim](https://yousim.ai/)
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
- [[YouSim;-Explore-The-Multiverse-of-Identity|Read about why we made it]]

View File

@ -1,41 +0,0 @@
---
title: Release Notes 09.25.25
date: 09.25.25
tags:
- releases
- announcements
- honcho
- dev
---
## Honcho Updates v2.3.2
- Honcho is 10x faster!
- Added the ability to fetch peer cards directly from the API for streamlined access
- Reliability improvements
- Stability and performance improvements, bug fixes
Added
- Get peer cards endpoint (`GET /v2/peers/{peer_id}/card`) for retrieving targeted peer context information
Changed
- Replaced Mirascope dependency with small client implementation for better control
- Optimized deriver performance by using joins on messages table instead of storing token count in queue payload
- Database scope optimization for various operations
- Batch representation task processing for ~10x speed improvement in practice
Fixed
- Separated clean and claim work units in queue manager to prevent race conditions
- Skip locked ActiveQueueSession rows on delete operations
- Langfuse SDK integration updates for compatibility
- Added configurable maximum message size to prevent token overflow in deriver
- Various minor bugfixes
## Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,31 +0,0 @@
---
title: Release Notes 10.02.25
date: 10.02.25
tags:
- releases
- announcements
- honcho
- dev
---
# Honcho Updates v2.3.3
- A modified deriver that balances speed with providing with max possible context for Peer representations updates
- More capable SDKs to compose the different contextual elements of Honcho more easily (Peer Cards, Messages, etc)
- Easier to build reactive applications that dynamically change based on deriver progress
## ADDED
- SDK: Get Peer Card method
- SDK: Update Message metadata method
- SDK: Session level deriver status methods - SDK: Delete session message
## CHANGED
- SDK: Pagination class to match core implementation
- CORE: Deriver Rollup Queue processes interleaved messages for more context
## FIXED
- SDK: Dialectic Stream returns Iterators
- SDK: Type warnings
- CORE: Dialectic Streaming to follow SSE conventions
- CORE: Sentry tracing in the deriver
# Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,38 +0,0 @@
---
title: Release Notes 10.10.25
date: 10.10.25
tags:
- releases
- announcements
- honcho
- dev
---
# HONCHO v2.4.0
- `Get_Context` is faster, gives richer context, & more powerful, now returns working representation & peer card
## ADDED
- Unified `Representation` class
- vllm client support
- Periodic queue cleanup logic
- WIP Dreaming Feature
- LongMemEval to Test Bench
- Prometheus Client for better Metrics
- Performance metrics instrumentation
- Error reporting to deriver
- Workspace Delete Method
- Multi-db option in test harness
- SDK version 1.5.0 for compatibility
## CHANGED
- Working Representations are Queried on the fly rather than cached in metadata
- EmbeddingStore to RepresentationFactory
- Summary Response Model to use public_id of message for cutoff
- Semantic across codebase to reference resources based on `observer` and `observed`
- Prompts for Deriver & Dialectic to reference peer_id and add examples
- `Get_Context` route returns peer card and representation in addition to messages and summaries
- Refactoring logger.info calls to logger.debug where applicable
## FIXED
- Gemini client to use async methods
# Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,54 +0,0 @@
---
title: Release Notes 10.31.24
date: 10.31.24
tags:
- releases
- honcho
- dev
- yousim
---
## News
New Honcho Updates:
- [[Release-Notes-10.31.24#honcho-v0012|Honcho v0.0.12]]
- [Python SDK v0.0.15](https://pypi.org/project/honcho-ai/)
- [NodeJS SDK v0.0.6](https://www.npmjs.com/package/honcho-ai)
Honcho Demo [YouSim](https://yousim.ai) went [viral](https://x.com/courtlandleer/status/1851009358752076261)!
## Honcho v0.0.12
an Overhauled Deriver and Dialectic API!
ADDED
- GitHub Actions Testing
- Ability to disable derivations on a session using the `deriver_disabled` flag
- in a session's metadata
- `/v1/` prefix to all routes
CHANGED
- Environment variable to control deriver workers
- Changed `public_ids` to use [NanoID](https://github.com/ai/nanoid) and internal ID to use `BigInt`
- Dialectic Endpoint can take a list of queries
- Using `uv` for project management
- User Representations stored in a metamessage rather than using reserved collection
- Base model for Dialectic API and Deriver is now Claude 3.5 Sonnet
- Paginated `GET` requests now `POST` requests for better developer UX
REMOVED
- Mirascope Dependency
- Slowapi Dependency
- Opentelemetry Dependencies and Setup
## Links
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity

View File

@ -1,30 +0,0 @@
---
title: Release Notes 11.05.25
date: 11.05.25
tags:
- releases
- announcements
- honcho
- dev
---
# HONCHO v2.4.1-2
Stability, reliability, speed.
## ADDED
- Alembic migration validation test suite
## CHANGED
- Logging infrastructure to remove noisy messages
- Sentry integration is centralized
- Alembic to always use a session pooler
- Statement timeout during alembic operations to 5 min
## FIXED
- Alembic migrations to batch changes
- Batch message creation sequence number
- Langfuse tracing to have readable waterfalls
- Alembic Migrations to match models.py
- `message_in_seq` correctly included in webhook payload
# Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -1,29 +0,0 @@
---
title: Release Notes 11.20.25
date: 11.20.25
tags:
- releases
- announcements
- honcho
- dev
---
# HONCHO v2.4.3
Performance & Reliability
## News
Honcho.chat is live!
Try it now: [honcho.chat](https://honcho.chat/)
## ADDED
- Redis caching to improve DB IO
- Backup LLM provider to avoid failures when a provider is down
## CHANGED
- QueueItems to use standardized columns
- Improved Deduplication logic for Representation Tasks
- More fine-grained metrics for representation, summary, and peer card tasks
- DB constraint to follow standard naming conventions
# Links
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
- [Check out the docs](https://docs.honcho.dev)

View File

@ -5,17 +5,15 @@ tags:
- "#ml"
- blog
- research
author: Courtland Leer & Vince Trost
description: How we achieved state-of-the-art results on the OpenToM theory-of-mind benchmark using DSPy to learn few-shot examples with GPT-3.5-turbo.
---
![[robot_cafe.png]]
# TL;DR
*We used [DSPy](https://dspy-docs.vercel.app/) to achieve SOTA results on the [OpenToM](https://github.com/seacowx/OpenToM) benchmark using `gpt-3.5-turbo`. The benchmark's creators suggest language models fall short when modeling mental states and psychology, but we find using DSPy to learn few-shot examples leads to significantly outperforming all the models tested (`gpt-4-turbo` included) along this precise axis.*
## TL;DR
We used [DSPy](https://dspy-docs.vercel.app/) to achieve SOTA results on the [OpenToM](https://github.com/seacowx/OpenToM) benchmark using `gpt-3.5-turbo`. The benchmark's creators suggest language models fall short when modeling mental states and psychology, but we find using DSPy to learn few-shot examples leads to significantly outperforming all the models tested (`gpt-4-turbo` included) along this precise axis.
The fact you can learn few-shot examples to make a small, fast model perform just as well on a task as a large, slow one is significant. This signals to us a need to broaden the scope of methods for evaluating Theory of Mind capabilities in LLMs, because the social cognition needed to [[Humans like personalization |build great products]] goes far beyond just answering questions about stories.
## The OpenToM Dataset
*The fact you can learn few-shot examples to make a small, fast model perform just as well on a task as a large, slow one is significant. This signals to us a need to broaden the scope of methods for evaluating Theory of Mind capabilities in LLMs, because the social cognition needed to [[Humans like personalization |build great products]] goes far beyond just answering questions about stories.*
# The OpenToM Dataset
On February 14th, 2024 a paper dropped on ArXiv introducing the OpenToM benchmark: a new dataset to use for evaluating Theory of Mind (ToM) in Large Language Models. ToM evals are typically borrowed from developmental psychology and consist of character-driven scenarios. The language model is asked to answer questions about various aspects of the characters' mental states. This ability has traditionally been thought of to be uniquely human (or limited to a very few species), but language models are starting to exhibit some level of proficiency in this task as well.
The authors of this paper point out how the characters in existing datasets lack personality traits or preferences, along with motivations for their actions. To remedy this, they devised a generation pipeline that does the following:
@ -43,10 +41,8 @@ Within Location there are *coarse* and *fine* questions and within both Location
- **Second Order**: inquires about a character's belief of another character's mental state
In the ToM space, there is really only one prompting technique that has shown improved results over Chain of Thought (CoT) called "SimToM" [(Wilf, et al)](https://arxiv.org/pdf/2311.10227.pdf), which is a two-stage prompting framework to re-phrase the narrative through the perspective of the subject in question. CoT and SimToM are the only two tested against the dataset in the paper.
## Experiments with DSPy
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[User State is State of the Art#^461ac9 |learn]] the prompts and examples to optimize performance on this benchmark?"
# Experiments with DSPy
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[ARCHIVED; Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[ARCHIVED; User State is State of the Art#^461ac9|learn]] the prompts and examples to optimize performance on this benchmark?"
This task is relatively easy to define in DSPy terms: `(context, question -> answer)`. This [guide](https://dspy-docs.vercel.app/docs/tutorials/simplified-baleen#optimizing-the-pipeline) was helpful in crafting our modules which can be found [here](https://github.com/plastic-labs/dspy-opentom/blob/main/cot.py). The authors of the OpenToM paper also released extensive [evaluation code](https://github.com/plastic-labs/dspy-opentom/blob/main/opentom_evaluator.py) which we leveraged heavily for parsing the LM's answers and assessing them.
@ -57,9 +53,7 @@ We conducted the following experiments:
3. Learn system prompts with the `SignatureOptimizer` and the `BayesianSignatureOptimizer`
Obviously there is much more we could have done, so if you're reading this and you have the time (and inferencing budget) to run more comprehensive experiments, [get in touch](https://discord.gg/plasticlabs) — we'd love to help!
## Results
# Results
The findings of our experiments were mixed but promising. We found that the only experiment that showed positive results was compiling a CoT-prompted `gpt-3.5-turbo` module with the `BootstrapFewShotWithRandomSearch` optimizer. Both of the signature optimizers and `gpt-4` as a teacher in `BootstrapFewShotWithRandomSearch` didn't have much of an effect.
Our full experiment amounted to roughly $300 in inference costs, running 50 training examples on 25 candidate programs. We evaluated performance the same way the paper did, by randomly sampling 50 examples from a hold out set in 5 batches and computing average F1 scores. You can view our forum discussion in the DSPy Discord [here](https://discord.com/channels/1161519468141355160/1214629969318252574).
@ -79,9 +73,7 @@ The following table shows our results from experiment number one compared to the
On most of the question types, we see CoT-prompted `gpt-3.5-turbo` compiled with `BootstrapFewShotWithRandomSearch` examples outperforms both CoT-prompted base `gpt-3.5-turbo` as well as `mixtral`, and comes close to `gpt-4-turbo` performance — which is quite impressive! The exceptions here are fine, second-order location questions (which outperform `gpt-4-turbo` 🥳) and fine, first-order location questions (which underperform `gpt-4-turbo`). Due to budget constraints, we only tested `gpt-3.5-turbo`.
What's particularly interesting is the performance on the fine, second-order location questions (Loc$_{f}(S)$). As a reminder, second-order questions inquire about a character's belief of another character's mental state. This is the exact type of question the OpenToM authors claim that LMs perform poorly on, yet we saw that with our learned few-shot examples, it outperforms all of the other language models significantly.
## Analysis of Augmented Examples
# Analysis of Augmented Examples
The augmented examples from the compiled modules seem to mimic the format of the stories within each question type/granularity. You can see all of them on [GitHub](https://github.com/vintrocode/dspy-opentom/blob/main/cot_modules.pkl), but here are two examples:
**Attitude**:
@ -99,16 +91,14 @@ It's hard to parse out any specific patterns between the examples themselves. It
That's it? What was it about Ryker's affinity for raincoats that piqued his curiosity when it was hung up? Why would the story end there? The same thing basically happened in the first story, with Paxton throwing away the socks and Anderson never knowing about it.
In manually inspecting both the dataset and the augmented examples, it's clear that GPT-4 (the model used to generate the narratives) had a tendency to dramatize things. But it's still unclear as to why these examples (along with 16 others) were useful in increasing task performance. To borrow a quote from [Battle and Gollapudi](https://arxiv.org/pdf/2402.10949.pdf), "the only real trend may be no trend". Maybe counterintuitively, this is still an important result.
## Towards Better Theory of Mind Evals
# Towards Better Theory of Mind Evals
The OpenToM authors were correct in identifying common pitfalls with existing ToM tests and their contributions with the dataset are a significant step forward. However, we still believe these tests are fundamentally flawed in an AI context.
We know that any observed "reasoning" in language models is due to behaviors learned in training. These tests are assessing their abilities to answer correctly in a single inference, which is both impressive and completely unrealistic. Real AI products already have access to memory, tools, multiple inferences, and more. They're going to be interacting with humans in more and more social settings, not trying to answer questions about hypothetical stories. Humans and agents are much more complex than that.
There was a time when people were upset at the inability to interpret features learned by neural networks. People have mostly moved on from that limitation in favor of the improved performance, so maybe it's time to do the same here. It follows the design philosophy of DSPy to abstract away the need to manipulate explicit prompts and examples to improve performance on a task. The examples it settled on were learned — DSPy worked exactly how it's supposed to. Deep learning uses neurons in a network to learn latent, arbitrary features optimized against an objective. The abstraction has just moved up a layer to the space of prompts that can be used to optimize against an objective.
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[User State is State of the Art |infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[ARCHIVED; User State is State of the Art|infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
Major thanks go to [Jacob Van Meter](https://www.linkedin.com/in/jacob-van-meter-nc/) for his significant contributions to this project, [Omar Khattab](https://twitter.com/lateinteraction) and the [DSPy](https://dspy-docs.vercel.app/) team, as well as the [OpenToM](https://github.com/seacowx/OpenToM) authors for moving the ToM space forward. You can see all of our code and data [here](https://github.com/plastic-labs/dspy-opentom/tree/main).

View File

@ -1,20 +1,21 @@
---
title: Can AI Models Predict What You'll Say Next? Developing Verifiable Social Rewards
author: Dani Balcells
date: 02.28.25
tags:
- research
- ml
author: Dani Balcells
description: Developing verifiable social rewards for AI--benchmarking LLMs on next-message prediction in conversations & discovering that reasoning models underperform on social cognition.
---
## TL;DR
We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test wether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.
# TL;DR
*We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test whether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[ARCHIVED; Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.*
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public).
*Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public).*
![Figure 1: Model performance across different context modes](model_performance_by_context_mode.png)
*Figure 1. Next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
## Finding Verifiable Social Rewards
# Finding Verifiable Social Rewards
The machine learning community has made significant progress optimizing language models for tasks with clear, verifiable answers, like math, coding, and factual reasoning. These domains offer what are called "verifiable rewards": objective measures that can be used for reinforcement learning without relying on human preferences or subjective judgments. While this approach has yielded impressive results for technical reasoning, at Plastic Labs we've become increasingly curious about whether similar verifiable reward structures could be developed for social intelligence.
Here, by social intelligence we mean the ability to accurately interpret others' intentions, emotions, and likely behaviors in social contexts--essentially modeling other minds to predict social outcomes. In this sense, our social cognition is as essential to our functioning as having a robust predictive model of physics, our environment and proprioception. While humans develop this ability naturally through social feedback (successful predictions are "rewarded" with smoother interactions), creating objective measures for this in AI systems remains challenging.
@ -24,12 +25,12 @@ To address this gap, we developed a multiple-choice next-message prediction task
This creates a clear, verifiable reward signal for social understanding: either the model correctly identifies the real message or it doesn't. Yet unlike many technical tasks, success requires the model to understand conversational dynamics, recognize individual communication patterns, track context across multiple turns, and model how different people behave in specific social contexts.
This benchmark also allows us to test whether models specifically optimized for technical reasoning generalize to social understanding, and to get a granular, quantifiable understanding of models' social reasoning abilities.
## Prior work & inspiration
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
# Prior work & inspiration
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[ARCHIVED; Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
With recent work on reasoning models, including DeepSeek's R1, showing remarkable gains through reinforcement learning on mathematical and coding tasks, we're particularly interested in developing verifiable social rewards that could drive similar improvements in social reasoning. Unlike technical domains with clear right and wrong answers, social prediction introduces unique challenges--yet, establishing benchmarks in this area could unlock entirely new dimensions of AI capability that are crucial for creating systems that truly understand and adapt to human users.
## Methodology
### Dataset Creation
# Methodology
## Dataset Creation
We created our dataset by extracting conversation snippets from our internal team Discord channels (accessible only to our core team of 5-10 people). Each snippet contained:
- 6-10 messages between exactly two participants.
@ -61,7 +62,7 @@ We ended up with 123 snippets—below is an example:
> [!question]- Can you guess the right answer?
> D! Classic Vince being Bayesian.
### Context Modes
## Context Modes
Upon visual inspection of the resulting dataset, we found that the decoys were remarkably similar to the real messages, making it difficult even for us to consistently identify the genuine response. We wondered if providing additional context about the users might help determine the correct answer, which led us to explore different context modes:
1. **No Context**: Models only received the immediate conversation snippet and the four options.
@ -69,7 +70,7 @@ Upon visual inspection of the resulting dataset, we found that the decoys were r
3. **Summary Context**: Models received the conversation snippet plus a generated personality profile of the target user, created by processing the previous 50 or 100 messages through Llama 3.3 70B. The prompt used to generate this summary is available in the [project repo](https://github.com/plastic-labs/next-message-prediction-public/blob/950384174023ba315b628d3ba7bdb7c00b918544/generate_dataset.py#L156) on GitHub.
This design allowed us to compare whether any context provides useful signals for predicting social behavior, and whether a summary can provide results comparable to the full context.
### Experimental Setup
## Experimental Setup
We tested a wide range of models including:
- Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3.5 Haiku.
- GPT-4.5, GPT-4o, GPT-4o Mini, O-1, O-3 Mini.
@ -79,15 +80,15 @@ We tested a wide range of models including:
- DeepSeek models (Chat and R1).
For each model and context mode combination, we ran three trials with different random seeds to control for position bias in option selection. Ideally we would have run more trials, but we wanted to constrain the compute needed for this experiment.
## Results and Discussion
# Results and Discussion
The results of our experiment are shown in Figure 1. In this section, we analyze them in detail and provide some insights and interpretation.
![Figure 1: Model performance across different context modes](model_performance_by_context_mode.png)
*Figure 1. Mean next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
### Context Helps Regardless of Form
## Context Helps Regardless of Form
Additional context helps models predict social behavior, whether that context is provided as raw conversation history or as a processed summary. Moving from no context to either raw or summary context yielded substantial improvements for virtually all models tested. This confirms what might seem intuitive: knowing more about someone helps predict what they might say next.
### Efficient Context Processing Works
## Efficient Context Processing Works
What's particularly significant is that injecting pre-processed summaries of user context works as well as or better than providing raw context for most models. This has important implications for system design:
1. The summaries contain far fewer tokens than raw context (approximately one paragraph versus potentially thousands of tokens).
@ -97,28 +98,27 @@ What's particularly significant is that injecting pre-processed summaries of use
This supports a core [thesis](https://blog.plasticlabs.ai/blog/Theory-of-Mind-Is-All-You-Need) behind Honcho: ambient processing of user context to generate compressed representations can improve model performance while keeping inference costs manageable. Rather than injecting massive amounts of data into the context window, models can achieve better results with distilled personality profiles.
We didn't observe significant performance differences between 50-message and 100-message contexts, suggesting there may be diminishing returns beyond a certain point. This is likely dependent on factors like user count and conversation density.
### Newest Models Lead the Way
## Newest Models Lead the Way
Only the newest models perform well on this task. Claude 3.7 Sonnet and GPT-4.5 (both released last week) were the only models to achieve accuracy significantly above 40% in any context mode, with Claude 3.7 (non-thinking) reaching nearly 60% accuracy with summary context—more than doubling the 25% random baseline.
This is particularly interesting because tasks that would have seemed impossible for models that existed just months ago are now becoming tractable. This rapid progress also informs how we should think about designing evaluations—creating hard tasks that aren't saturated from the start rather than ones where models already perform at ceiling.
### Different Models Benefit from Different Contexts
## Different Models Benefit from Different Contexts
While summary context generally outperformed raw context, this pattern wasn't universal. Some models (notably Claude 3.5 Sonnet and GPT-4.5) performed better with raw context than with summaries. This suggests different architectures may vary in their ability to extract relevant information from different types of context.
### Reasoning vs Social Understanding Trade-offs
## Reasoning vs Social Understanding Trade-offs
The relatively poor performance of models optimized for technical reasoning, like Claude 3.7 Sonnet (thinking), DeepSeek R1, and OpenAI's O-1 and O-3 Mini, raises interesting questions. Despite their strong results on math and coding benchmarks, these models achieved well below random performance on our social prediction task.
This suggests potential trade-offs in model optimization. The reinforcement learning or supervised fine-tuning techniques used to enhance reasoning abilities might come at the expense of social cognition capabilities. However, without access to the architectures, data and training procedures that major labs like Anthropic and OpenAI use to build these models, it's hard to know exactly what might be causing models like Claude 3.7 Sonnet and GPT-4.5 to perform so much better on this task.
### Caveat: Decoy Generation
## Caveat: Decoy Generation
We should note that our decoys were generated using Claude 3.7 Sonnet, which was also the best-performing model on the task. It's possible that Claude 3.7 is better at recognizing the subtleties in its own generations. However, this almost creates a generative adversarial setup—Claude 3.7 is both generating challenging decoys and trying to identify them—which makes its strong performance even more notable.
## Future Directions
### Verifiable Social Rewards for RL
# Future Directions
## Verifiable Social Rewards for RL
So far, we've used this task purely as an evaluation metric, but with a large enough dataset, it could potentially serve as a reward signal for reinforcement learning. This would allow for optimization of social cognition abilities with objective metrics, similar to how technical reasoning has been enhanced. Expanding our toolkit of objective social evaluation metrics could help bridge the gap between technical and social intelligence.
### Social-Reasoning Balance
## Social-Reasoning Balance
Can we develop training techniques that enhance reasoning capabilities without sacrificing social cognition? This might involve carefully designed datasets that balance technical and social tasks, or novel fine-tuning approaches that preserve multiple types of capabilities. Understanding the apparent trade-off between these abilities could be crucial for developing more well-rounded AI systems.
### Context Optimization and Alternative Approaches
## Context Optimization and Alternative Approaches
We're also interested in exploring several technical improvements to the methodology: finding the minimum effective context window size across different environments; testing different prompting techniques and models for generating personality summaries; experimenting with combinations of raw and summary contexts; and trying different models for decoy generation to address potential advantages Claude 3.7 might have in recognizing its own outputs.
## Conclusion
# Conclusion
We were excited to find that this social prediction task was genuinely challenging for most current models, with only the very latest releases showing strong performance. The fact that models optimized for reasoning performed poorly suggests interesting trade-offs in current training approaches. Meanwhile, the effectiveness of pre-processed context summaries supports a key principle behind Honcho: ambient processing of user context can significantly improve personalization while managing compute costs.
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public). We used our private Discord messages for the experiment so we're unable to publish our own dataset, but the repository contains instructions to replicate the experiment with your own data. If you have any questions, feel free to ask on GitHub!

View File

@ -1,16 +1,14 @@
---
title: Evaluating Steerability in Large Language Models
author: Dani Balcells
date: 12.14.24
tags:
- research
- ml
author: Dani Balcells
description: A new benchmark framework for measuring how well AI systems can adapt to different personas, implementing the first trade-off steerable benchmark.
---
## TL;DR
# TL;DR
*This is a research update on our ongoing work to implement concrete benchmarks for measuring AI systems' ability to adapt to different users. We've created what we believe is the first implementation of a "trade-off steerable benchmark" - a framework proposed by Sorensen et al. for evaluating how well AI systems can be steered to reflect different perspectives. While we've made progress on the core dataset and evaluation pipeline, several key questions remain about how to make this benchmark as useful as possible to the research community. We're sharing this update to gather feedback at NeurIPS 2024 in Vancouver on the most valuable directions to take this work.*
# 1. Measuring AI Systems' Ability to Adapt to Different Users
At Plastic Labs, we're building AI systems that can adapt to and act on behalf of their users. As we continue to improve these systems, it's critical that we can reliably measure their ability to faithfully represent different people's views and behaviors.
@ -19,7 +17,6 @@ Today we're introducing a new evaluation framework that systematically tests an
The AI community has made remarkable progress in building powerful language models that can engage in open-ended dialogue. However, these models are typically aligned through techniques like RLHF that optimize for a single set of "average" human preferences. This approach falls short when we want AI systems that can truly adapt to individual users with different values, personalities and preferences.
Recent work has established the importance of pluralistic alignment - ensuring AI systems can faithfully represent diverse human perspectives. While conceptual frameworks for measuring this capability have been proposed, notably by Sorensen et al., the authors acknowledge that to their knowledge no concrete implementations of these frameworks exist yet. This makes it difficult to assess progress or compare different approaches.
## Our Approach
We've created an evaluation framework that systematically measures an AI system's ability to adapt to different personas. The core idea is simple: we give the system a few examples of how a persona thinks and behaves, then test whether it can accurately predict that persona's views on new scenarios. By testing many different personas and comparing how well each steered version of the system maintains fidelity to its target persona, we can quantify how "steerable" the system is.
@ -28,10 +25,8 @@ Our research questions include:
- How well do simple steering approaches like few-shot learning actually perform?
In the following sections, we'll detail our methodology and share initial results that shed light on these questions. We hope this work helps establish more rigorous ways to evaluate AI systems' ability to reflect human diversity.
# 2. Creating a Dataset to Test Personality Adaptation
To evaluate an AI system's ability to adapt to different personas, we first needed a dataset of diverse personalities and their characteristic behaviors. We approached this as a careful balance between coverage, quality and cost - we wanted to represent a wide range of human personalities while ensuring the data was reliable enough to serve as ground truth, all while keeping the time and compute required to develop the dataset to a reasonable minimum.
## Seeding Diverse Personas
For our initial implementation, we needed a systematic way to generate personas that would exhibit meaningfully different attitudes and behaviors. While recent work like the Billion Personality Dataset has explored prompting LLMs with simple role descriptions like "a musician interested in audio processing" or "a moving company driver", there's no guarantee such prompts will produce distinct behavioral patterns. Instead, we used five well-known personality frameworks (Myers-Briggs Type Indicator, Enneagram, Big Five, Zodiac signs, and Tarot archetypes) that each attempt to provide complete coverage of human personality space.
@ -93,7 +88,6 @@ The binary agree/disagree format enables reliable scoring while minimizing measu
# 3. Methodology: Measuring Steerability
## The Core Task: Steering and Testing
Our evaluation framework measures how well a given system can steer to different personas. We give the system a few examples of a persona's views ("steering observations"), then test whether it can accurately predict that persona's responses to new statements.
Formally, we define:
@ -120,7 +114,6 @@ For example, to test adaptation to an INFP personality:
To measure the overall steerability of the system, we repeat the process above for all personas and average the resulting percentile rank scores.
We show the preliminary results of running this evaluation framework on few-shot steerable systems - baseline systems that implement steering by including the steering observations in their system prompt formatted as "you are role-playing as a person that agrees with the following statements: \[agree observations] and disagrees with the following observations \[disagree observations]". We use the same few-shot prompt on GPT-4o Mini, Gemini 1.5 Flash and Claude 3.5 Sonnet.
# 4. Results and Discussion
## Score Matrix Analysis

View File

@ -1,29 +1,24 @@
---
title: Introducing Neuromancer XR
author: Dani Balcells
date: 08.18.2025
subtitle: Our Reasoning Model for State-Of-The-Art Memory
date: 08.18.25
tags:
- research
- ml
- "#neuromancer"
subtitle: Our Reasoning Model for State-Of-The-Art Memory
author: Dani Balcells
description: Meet Neuromancer XR--our custom reasoning model that achieves state-of-the-art memory by extracting & scaffolding logical conclusions from conversations.
---
![[opengraph_neuromancer.png]]
## TL;DR
_Memory is a foundational pillar of social cognition. As a key component of [Honcho](https://honcho.dev), we approach it as a combined reasoning and retrieval problem. In this post, we introduce Neuromancer XR, the first in a series of custom reasoning models that works by extracting and scaffolding atomic conclusions from user messages across two strictly defined levels of logical certainty: explicit and deductive. It's the result of fine-tuning Qwen3-8B on a manually curated dataset mapping conversation turns to atomic conclusions. Using Neuromancer XR as the reasoning engine behind our core product Honcho leads to 86.9% accuracy on the [LoCoMo](https://snap-research.github.io/locomo/) benchmark, compared to 69.6% using the base Qwen3-8B model, and 80.0% when using Claude 4 Sonnet as baseline, to achieve state of the art results. The next model in the series, Neuromancer MR will extract and scaffold observations at two further levels along the spectrum of certainty: inductive and abductive. This will allow us to front-load most of the inference needed to improve LLMs' social cognition skills, powering AI-native products that truly understand any peer in a system, be it a user or an agent._
---
# TL;DR
*Memory is a foundational pillar of social cognition. As a key component of [Honcho](https://honcho.dev), we approach it as a combined reasoning and retrieval problem. In this post, we introduce Neuromancer XR, the first in a series of custom reasoning models that works by extracting and scaffolding atomic conclusions from user messages across two strictly defined levels of logical certainty: explicit and deductive. It's the result of fine-tuning Qwen3-8B on a manually curated dataset mapping conversation turns to atomic conclusions. Using Neuromancer XR as the reasoning engine behind our core product Honcho leads to 86.9% accuracy on the [LoCoMo](https://snap-research.github.io/locomo/) benchmark, compared to 69.6% using the base Qwen3-8B model, and 80.0% when using Claude 4 Sonnet as baseline, to achieve state of the art results. The next model in the series, Neuromancer MR will extract and scaffold observations at two further levels along the spectrum of certainty: inductive and abductive. This will allow us to front-load most of the inference needed to improve LLMs' social cognition skills, powering AI-native products that truly understand any peer in a system, be it a user or an agent.*
# Table Stakes
At Plastic, we want to enable builders to create AI applications and agents with exceptional social intelligence: tools that are able to understand who you are and what you mean, whether it's an AI tutor that adapts to your learning style or a multi-agent system that anticipates your needs. These applications all require something fundamental that's only recently begun to draw attention: memory.
Most approaches treat memory as an end product or top-level [[Memory as Reasoning#Memory is ~~Storage~~ Prediction|feature]], enabling information to persist across chatbot sessions, but we consider it the foundation of something much bigger: the ability for LLMs to build mental models of their users and one another and draw from those representations in real time. This capability is essential for personalization, engagement, and retention. Not to mention multi-agent systems, individual alignment, and the trust required for agentic behavior. It's the difference between an AI that merely responds to queries and one that genuinely understands and adapts to the person it's talking to; the difference between out-of-the-box experiences and ones cohered to a users personal identity
To do anything approaching the social cognition required, Honcho must be state-of-the-art in memory: able to recall observations about users across conversations with superhuman fidelity. Today, we're sharing our approach and early results from training a specialized model that treats [[Memory as Reasoning|memory as a reasoning task]] rather than simple static storage.
To do anything approaching the social cognition required, Honcho must be state-of-the-art in memory: able to recall observations about users across conversations with superhuman fidelity. Today, we're sharing our approach and early results from training a specialized model that treats [[Memory as Reasoning|memory as a reasoning task]] rather than simple static storage.
# Memory as Reasoning
Reasoning models continue to surge in capability and popularity. And with them, our approach to memory. Why not design it as a reasoning task concerned with deliberating over the optimal context to synthesize and remember? We turned to formal logic to develop four methods of reasoning, along a spectrum of certainty, toward conclusions to derive from conversational data:
- **Explicit**: Information directly stated by a participant.
@ -91,20 +86,17 @@ Reasoning models continue to surge in capability and popularity. And with them,
> > > - Erin probably has a growth mindset (transformed health concern into athletic goal, combines activities like reading while running)
Having clear definitions for these four types of reasoning and their corresponding levels of certainty also allows us to establish how different kinds of observations relate to one another. Specifically, we require observations to scaffold only on top of observations with higher certainty: an abduction (e.g. "Erin values her health proactively") can use a deduction (e.g. "Erin exercises regularly") or induction (e.g. "Erin prioritizes healthy eating during weekdays") as one of its premises, but not the other way around. That is, one can speculate given a certain conclusion, but one cannot attempt to conclude something logically from prediction. Implied in this is that the model must show its work. A conclusion must include its premises, its evidence and support.
# Neuromancer XR: Training a Logical Reasoning Specialist for Memory
To implement this vision, we need a model that can reliably extract and categorize conclusions from conversations. Our initial focus for the memory task, given its focus on factual recall, is on the first two certainty levels: explicit and deductive knowledge--that is, conclusions we know to be true given what users (or agents) state in their messages.
We generated a proprietary dataset of approximately 10,000 manually curated instances of conclusion derivation, creating memory-reasoning traces from conversational data. Each instance shows how to process a conversation turn and derive the relevant conclusions at appropriate certainty levels. We then fine-tuned Qwen3-8B on these traces.
The resulting model is Neuromancer XR (for eXplicit Reasoning), a model specialized in deriving explicit and deductive conclusions from conversational data. It is currently in production powering the latest release of [Honcho](https://www.honcho.dev).
## Integration with Honcho
![[neuromancer_honcho_diagram.png]]
*Figure 1. Diagram of the Honcho workflow.*
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
# Evaluation
Although the Honcho workflow allows us to answer any arbitrary question about a peer, from the purely factual to the predictive, it's important for us to be able to benchmark its raw memory abilities--how accurately it can recall factual information shared by a user in a conversation.
@ -146,27 +138,21 @@ This can lead to poor embedding quality, making retrieval more difficult, or add
We further speculate that deciding what information to extract for memory purposes from a conversation turn is something that small models are definitely capable of, as it's mostly a matter of identifying and correctly rephrasing information that's already present in the text and making small logical deductions based on it. This contrasts however, with the more complex tasks needed for AI-native memory and social cognition, hardly limited to abilities like inferring user intent or theory of mind, which require generating substantial amounts of information not present in the text itself.
# Directions for future work
We're training a model for the remaining two levels of logical certainty outlined above in our framework: inductive and abductive. The next model in the Neuromancer series, Neuromancer MR (for meta-reasoning), will be in charge of this.
This model will reason about reasoning, focusing on the predictive side of the certainty spectrum. It will allow us to derive likely explanations and probable hypotheses for broad patterns of user or agent behavior at the moment of ingestion, bolstering the density and utility of peer representations. Were developing internal evaluations for this task, as none currently exist for this frontier of synthetic social cognition.
## Front-loading social reasoning inference
One of the advantages of this memory framework is that it allows us to front-load a lot of the meta-cognitive inference that's required to improve LLMs' social intelligence and theory of mind capabilities. In our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|prior research]], as early as 2023, we show that allowing LLMs to reason over conversational data in a chain-of-thought style would allow them to develop high-fidelity models of users' mental states.
Most other LLM frameworks store atomic, low-level "facts" about users and include them as context at generation time. This, in theory, and with enough carefully prompted inference-time compute, would allow a good enough model to develop abstract theories about the user's mental state as it tries to answer a query about the user. However, it would have to happen implicitly in the model's thought process, which in turn means that the theories about the user's mental state are ephemeral, opaque and unpredictable. Approaches such as this therefore are inconsistent and inefficient, and would further struggle to meet the challenges of true social cognition.
Our approach, on the other hand, shifts most of the load of reasoning about the peer from generation time to the earlier stages of the process, when messages are processed and ingested. By the time observations are retrieved for generation, low-level messages have already been distilled and scaffolded into a hierarchical, certainty-labeled, and easy to navigate tree containing a high-fidelity user representation.
## Beyond recall: toward social intelligence
Evaluations and benchmarks are essential tools on our path to develop better frameworks for the development of AI-native tools. However, they don't tell the whole story: no evaluation is perfect, and hill-climbing can easily mislead us into optimizing for higher scores rather than the true north star: the overall quality of our product. For us, that means treating memory not as a hill to die on, but as table-stakes in our pursuit of social cognition that can truly transform the way AI-native tools understand us. Although success at this broader goal is much harder to quantify in conventional benchmarks, given the complex and under-specified nature of social cognition, we will continue to implement the evaluations that we find the most helpful for our agile development process.
In that spirit, we have our sights set on the remaining two levels of certainty we introduced at the beginning of this blog post: inductive and abductive. In our manual, preliminary testing, including all four levels of reasoning resulted in incredibly rich user representations being extracted from even the simplest interactions. What lies ahead of us is the exciting task of harnessing these representations and delivering them via Honcho in the fastest, most flexible and most agentic way.
## Some Notes on Model Naming
# Some Notes on Model Naming
>Personality is my medium.
&nbsp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp; -*Neuromancer* (Gibson, 1984)
@ -178,8 +164,6 @@ The character Neuromancer is an AI tasked with transmuting personal identity fro
In many ways, this is analogous to Plastic's mission to create representations of personal identity of such high-fidelity that they asymptotically approach the full complexity of the original person. But more specifically, our Neuromancer models are tasked with reasoning about user (or agent) data to create and scaffold the atomic conclusions from which we build those representations.
So not only does the name fit, but it also honors and strives toward the incredible ambition of Gibson's vision still yet to be realized 40 years later.
# Appendix A: LLM-as-judge design and prompt
In our evaluation of the three models we tested, we used the standard GPT 4o-mini as an LLM-as-judge, using the prompt below, in order to label responses as correct or incorrect. This is a choice from several factors, which we outline below.

View File

@ -1,22 +1,18 @@
---
title: "SPIRAL: Letting LLMs Teach Themselves Through Self-Play"
author: Dani Balcells
date: 08.15.24
date: 08.15.25
tags:
- research
- ml
- reinforcement
- learning
- rl
author: Dani Balcells
description: How self-play on text games develops generalizable reasoning skills in LLMs--achieving 8.6% math improvement from training on poker with no mathematical content.
---
![[selfplay.png]]
*Source: [Liu, Guertler et al., 2025](https://arxiv.org/abs/2506.24119).*
## TL;DR
# TL;DR
_We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition._
---
## Teaching Social Cognition Through Games
# Teaching Social Cognition Through Games
At Plastic Labs, one of our key research interests is improving language models' social cognition: their ability to represent people's mental states, predict users' behaviors, and navigate complex social dynamics. This capability is essential for creating AI systems that can genuinely understand and adapt to individual users, yet it remains underdeveloped compared to technical abilities and so-called "hard skills" like reasoning and coding.
Complex skills like social cognition present unique challenges for conventional supervised learning, arguably the dominant paradigm in machine learning, where models are given labeled examples of correct behavior. Unlike conventional language modeling tasks such as question answering or translation, social understanding involves nuanced judgments about beliefs, intentions, and interpersonal dynamics. With social reasoning, on the other hand, creating comprehensive labeled datasets of correct behavior is not just expensive, but often an ill-posed and under-specified problem, given how hard it is to define what the right answer should be in the first place.
@ -28,9 +24,7 @@ These approaches have primarily focused on domains with verifiable answers: math
Our research soon connected us with [Leon Guertler](https://x.com/leonguertler) and the [TextArena](https://www.textarena.ai) team, who were working on a Python library designed for this exact purpose: providing text-only games as RL environments in the hopes that they might allow LLMs to acquire general skills. We soon discovered we were kindred spirits working on similar problems, and decided to collaborate.
This blog post introduces the first result of that collaboration: SPIRAL, a framework that allows LLMs to develop complex reasoning skills by playing text-based games against themselves. 
## SPIRAL's Key Contributions
# SPIRAL's Key Contributions
The [SPIRAL paper](https://arxiv.org/abs/2506.24119) demonstrates that self-play on simple games can develop generalizable reasoning skills without any domain-specific training data. The experiments consisted of training Qwen3-4B-Base on Kuhn Poker—a minimal three-card poker variant—for just 400 training steps. Despite the game containing no mathematical content whatsoever, this training improved the model's performance on math benchmarks by 8.6% and general reasoning by 8.4%. Perhaps most surprisingly, the self-play approach outperformed a baseline trained using supervised fine-tuning on 25,000 expert game trajectories, suggesting that the competitive dynamics of self-play provide a more effective learning signal than imitation learning.
Self-play creates fundamentally different training dynamics than conventional approaches. When a model plays against continuously updating copies of itself, it faces an opponent that evolves in lockstep with its own improvements. This prevents the static exploitation patterns that emerge when training against fixed opponents: in the paper, we find that models trained against unchanging opponents like Mistral or Gemini initially struggle, then plateau once they discover winning exploits. Furthermore, given the zero-sum nature of the games, self-play forces models to develop genuine strategic reasoning that remains robust against an ever-adapting adversary.
@ -42,9 +36,7 @@ What makes it possible for the skills learned through SPIRAL to generalize beyon
- Pattern recognition, helping the model identify recurring structures and regularities, such as recognizing when an opponent's betting pattern signals strength.
The main technical innovation that enabled stable self-play training was Role-conditioned Advantage Estimation (RAE). It is designed to mitigate the effects of variance, a common challenge in multi-agent reinforcement learning. Facing a constantly changing opponent makes it difficult to determine whether a given positive reward should be attributed to good play or to a mistake by an opponent, which in turn makes model updates unreliable and unstable. RAE addresses this by maintaining separate baselines for each role in the game, normalizing rewards relative to the expected performance in each specific role. Without RAE, the training often led to "thinking collapse", where gradients become unstable and eventually drop to near zero, halting learning and resulting in nonsensical outputs. 
## Next Steps for Social Intelligence
# Next Steps for Social Intelligence
For Plastic Labs, SPIRAL is a first step pointing us in an intriguing direction: competitive self-play as an effective way to teach models complex skills without domain-specific supervision. It opens the door for us to explore using similar approaches to teach models social cognition specifically.
Were currently exploring whether social deduction games like Mafia, Avalon and Werewolf are the natural next step for this approach. They require exactly the capabilities we want models to develop: maintaining accurate models of multiple agents' mental states simultaneously, detecting deception through subtle behavioral cues, building trust strategically, and managing the flow of information to achieve goals. Success in these games depends on genuine social understanding, precisely the core components of social cognition that remain underdeveloped in current language models.

View File

@ -5,10 +5,10 @@ tags:
- research
- ml
- philosophy
author: Courtland Leer, Vince Trost, & Vineeth Voruganti
description: Research showing how predictive coding-inspired metacognitive prompting enhances LLM theory of mind abilities & reduces prediction error about users.
---
[Read on Arxiv](https://arxiv.org/abs/2310.06983).
Or download here:
[Read on Arxiv](https://arxiv.org/abs/2310.06983).
<iframe style="width: 100%;height: 50vh" src="https://arxiv.org/pdf/2310.06983.pdf"></iframe>

View File

@ -1,7 +1,7 @@
$pageWidth: 750px;
$mobileBreakpoint: 600px;
$tabletBreakpoint: 1000px;
$sidePanelWidth: 380px;
$sidePanelWidth: 308px;
$topSpacing: 6rem;
$fullPageWidth: $pageWidth + 2 * $sidePanelWidth;
$boldWeight: 700;

118
warp.md Normal file
View File

@ -0,0 +1,118 @@
# Plastic Labs Blog
This is the Plastic Labs blog, built with Quartz v4 - a static site generator for publishing digital gardens and notes.
## Project Overview
- **Framework**: Quartz v4 (built on top of Markdown processing with unified/remark/rehype)
- **Content Location**: `content/` directory
- `blog/` - Blog posts
- `research/` - Research content
- `extrusions/` - Extrusions content
- `notes/` - Notes
- `careers/` - Career-related content
- `releases/` - Release announcements
- **Static Assets**: `static/` directory (copied to public root during build)
- **Configuration**: `quartz.config.ts`
## Prerequisites
- Node.js >= 18.14
- npm >= 9.3.1
## Common Commands
### Setup
```bash
# Install dependencies
npm install
```
### Development
```bash
# Build and serve the site locally
npx quartz build --serve
# Build and serve docs specifically
npm run docs
```
### Code Quality
```bash
# Type check
npm run check
# Format code
npm run format
# Run tests
npm run test
```
### Git Workflow
```bash
# Check current branch
git branch
# Create new branch
git checkout -b your-branch-name
# Check status
git status
# Stage changes
git add .
# Commit changes
git commit -m "your message"
# Push to remote
git push origin your-branch-name
# Pull latest changes
git pull origin branch-name
# Pull with rebase (recommended when you have local commits)
git pull --rebase origin branch-name
```
## Configuration
The site is configured via `quartz.config.ts`:
- **Site Title**: 🥽 Plastic Labs
- **Base URL**: blog.plasticlabs.ai
- **Theme**: Custom dark/light mode with Departure Mono headers and Roboto Mono body
- **Analytics**: PostHog
- **Ignored Patterns**: `private/`, `templates/`
## Custom Features
- Custom static file copying plugin (CopyStatic)
- OpenGraph images with default `/og-image.png`
- RSS feed and sitemap generation
- SPA navigation enabled
- Popovers enabled
## Deployment
The site uses Docker for deployment (see `Dockerfile`).
## Branch Structure
- `v4` - Main production branch
- Feature branches follow pattern: `username/feature-name`
## Troubleshooting
### Push Rejected
If you get "rejected - fetch first" errors:
1. Pull with rebase to preserve your local commits: `git pull --rebase origin branch-name`
2. Then push: `git push origin branch-name`
### Dependencies Not Found
Run `npm install` to ensure all dependencies are installed.
## Resources
- [Quartz Documentation](https://quartz.jzhao.xyz/)
- [Discord Community](https://discord.gg/cRFFHYye7t)