mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-19 10:54:06 -06:00
Merge branch 'v4' into sync-upstream
This commit is contained in:
commit
abfe8a99c6
@ -1,39 +1,40 @@
|
|||||||
---
|
---
|
||||||
title: Home
|
title: Home
|
||||||
enableToc: false
|
enableToc: false
|
||||||
description: Welcome to our collaborative second brain.
|
description: Welcome to Plastic Labs' blog.
|
||||||
---
|
---
|
||||||
> [!custom] PLASTIC IS HIRING!
|
> [!custom] WELCOME TO [PLASTIC LABS](https://plasticlabs.ai)
|
||||||
> [[Working at Plastic|Open positions here]].
|
>
|
||||||
|
> Here you'll find our blog, research, and public notes. You can also [engage with the ideas directly](https://github.com/plastic-labs/blog).
|
||||||
Welcome.
|
>
|
||||||
|
> [Plastic](https://plasticlabs.ai) is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
|
||||||
Here you'll find our blog, research, and public notes. You can also [engage with the ideas directly](https://github.com/plastic-labs/blog).
|
>
|
||||||
|
> Our focus is developing [Honcho](https://honcho.dev/), an AI-native memory solution powered by our state-of-the-art [reasoning models](https://plasticlabs.ai/neuromancer). Honcho is a continual learning system for modeling personal identity, and soon a shared context layer for individual alignment.
|
||||||
[Plastic](https://plasticlabs.ai) is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
|
>
|
||||||
|
> The foundational layer of intelligence being built is just the beginning. Latent among the scores of specialized secondary and tertiary layers yet to be realized exists one for personal identity.
|
||||||
Our focus is developing systems that map personal identity using AI-native memory & social cognition. These systems enable individually-aligned agents you can trust to act autonomously and agents with rich identities all their own.
|
>
|
||||||
|
> We're building it.
|
||||||
The foundational layer of intelligence being built is just the beginning. Latent among the scores of specialized secondary and tertiary layers yet to be realized exists one for personal identity.
|
|
||||||
|
|
||||||
We're building it.
|
|
||||||
# Guide
|
# Guide
|
||||||
We post a few different types of content here:
|
We post a few different types of content here:
|
||||||
|
|
||||||
- [[blog | Blog]] -- Deep dives into the philosophy, cogsci, ML, & development underpinning our projects
|
- [[blog|Blog]] - Deep dives into the cogsci, development, & ML underpinning our projects
|
||||||
- [[careers | Careers]] -- Open positions at Plastic
|
- [[research|Research]] - Preprint or blog-style research we've made public
|
||||||
- [[notes | Evergreen Notes]] -- Short form working notes on Plastic theses
|
- [[notes|Notes]] - Short-form working notes on Plastic theses
|
||||||
- [[extrusions | Extrusions]] -- Brief, densely-linked reflections synthesizing recent conceptual work
|
- [[archive|Archive]] - Legacy content about out-of-date or depreciated projects & features
|
||||||
- [[releases | Release Notes]] -- Changelogs & details on new product features
|
- [[careers|Careers]] - Open positions at Plastic
|
||||||
- [[research | Research]] -- Formal published, preprint, or blog-style research we've made public
|
|
||||||
|
|
||||||
[*Subscribe to Updates*](https://plasticlabs.typeform.com/mailing)
|
[*Subscribe to updates*](https://plasticlabs.typeform.com/mailing).
|
||||||
# Projects
|
# Projects
|
||||||
If you find the content here compelling, explore our active projects:
|
Explore our active projects:
|
||||||
|
|
||||||
- [Honcho](https://honcho.dev) -- AI-native memory, reasoning, & socialcog for apps & agents ( #honcho)
|
**PRODUCTS**
|
||||||
- [Neuromancer](https://plasticlabs.ai/neuromancer) -- Reasoning models for memory & personal identity ( #neuromancer)
|
- [Honcho](https://honcho.dev) - AI-native memory & reasoning infra for apps & agents ( #honcho)
|
||||||
- [YouSim](https://yousim.ai) -- Honcho-powered identity simulator ( #yousim)
|
- [Neuromancer](https://plasticlabs.ai/neuromancer) - Reasoning models for memory & personal identity ( #neuromancer)
|
||||||
- [Penny for Your Thoughts](https://www.pennyforyourthoughts.ai/) -- Honcho/x402-powered personal expertise market ( #penny)
|
|
||||||
- [Bloom](https://bloombot.ai) -- Honcho-powered learning companion ( #bloom)
|
**DEMOS**
|
||||||
- [Xeno Grant](https://x.com/xenograntai) -- Direct to agent grants program ( #grants)
|
- [Honcho Chat](https://honcho.chat) - Honcho-powered AI-assistant platform with SOTA memory ( #chat)
|
||||||
|
- [Penny for Your Thoughts](https://www.pennyforyourthoughts.ai/) - Honcho/x402-powered personal expertise market ( #penny)
|
||||||
|
- [YouSim](https://yousim.ai) - Honcho-powered identity simulator ( #yousim)
|
||||||
|
|
||||||
|
**COMMUNITY**
|
||||||
|
- [Xeno Grant](https://x.com/xenograntai) - Direct-to-agent grants program ( #grants)
|
||||||
|
|||||||
@ -1,27 +1,31 @@
|
|||||||
---
|
---
|
||||||
title: "Comprehensive Analysis of Design Patterns for REST API SDKs"
|
title: "ARCHIVED: A Comprehensive Analysis of Design Patterns for REST API SDKs"
|
||||||
date: 05.09.2024
|
date: 05.09.24
|
||||||
tags: ["blog", "dev"]
|
tags:
|
||||||
author: "Vineeth Voruganti"
|
- blog
|
||||||
|
- dev
|
||||||
|
- archive
|
||||||
|
author: Vineeth Voruganti
|
||||||
|
description: A deep dive into SDK design patterns, comparing object-oriented vs singleton approaches & evaluating code generation platforms for API client libraries.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post contains Vineeth's (Plastic's Co-founder & CTO) notes on REST API SDK design patterns that informed how we built Honcho's client libraries. Some patterns described here have been superseded by our shift toward LLM-native interfaces, but the analysis of pagination, error handling, & developer experience remains useful for anyone building API tooling.
|
||||||
|
>
|
||||||
|
> For the most up-to-date SDK reference, check out the [Honcho Docs](https://docs.honcho.dev).
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)
|
*This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)*
|
||||||
and written by [Vineeth Voruganti](https://github.com/VVoruganti)
|
# TL;DR
|
||||||
|
|
||||||
## TL;DR
|
|
||||||
|
|
||||||
After several months of managing the SDKs for Honcho manually, we decided to
|
After several months of managing the SDKs for Honcho manually, we decided to
|
||||||
take a look at the options available for automatically generating SDKs.
|
take a look at the options available for automatically generating SDKs.
|
||||||
|
|
||||||
From our research we picked a platform and have made brand new SDKs for Honcho
|
From our research we picked a platform and have made brand-new SDKs for Honcho
|
||||||
that use idiomatic code, are well documented, and let us support more languages.
|
that use idiomatic code, are well documented, and let us support more languages.
|
||||||
|
# Introduction
|
||||||
---
|
For the past few months I have been working on managing the [Honcho](https://honcho.dev) project and its associated SDKs. We've been taking the approach of developing the SDK manually as we are focused on trying to find the best developer UX and maximize developer delight.
|
||||||
|
|
||||||
For the past few months I have been working on managing the
|
|
||||||
[Honcho](https://honcho.dev) project and its associated SDKs. We've been taking
|
|
||||||
the approach of developing the SDK manually as we are focused on trying to find
|
|
||||||
the best developer UX and maximize developer delight.
|
|
||||||
|
|
||||||
This has led to a rather arduous effort that has required a large amount of
|
This has led to a rather arduous effort that has required a large amount of
|
||||||
refactoring as we are making new additions to the project, and the capabilities
|
refactoring as we are making new additions to the project, and the capabilities
|
||||||
@ -30,20 +34,15 @@ of the platform rapidly expand.
|
|||||||
While these efforts have been going on a new player in the SDK generation space
|
While these efforts have been going on a new player in the SDK generation space
|
||||||
dropped on [hacker news](https://news.ycombinator.com/item?id=40146505).
|
dropped on [hacker news](https://news.ycombinator.com/item?id=40146505).
|
||||||
|
|
||||||
When I first started working on **Honcho** I did a cursory look at a number of SDK
|
When I first started working on Honcho I did a cursory look at a number of SDK
|
||||||
generators, but wasn't impressed with the results I saw. However, a lot of that
|
generators, but wasn't impressed with the results I saw. However, a lot of that
|
||||||
was speculative and Honcho was not nearly as mature as it is now.
|
was speculative and Honcho was not nearly as mature as it is now.
|
||||||
|
|
||||||
So spurred by the positive comments in the thread above I've decided to do a
|
So spurred by the positive comments in the thread above I've decided to do a
|
||||||
more detailed look into the space and, also try to develop a better understanding
|
more detailed look into the space and, also try to develop a better understanding
|
||||||
of what approaches are generally favorable in creating API client libraries.
|
of what approaches are generally favorable in creating API client libraries.
|
||||||
|
# Background
|
||||||
## Background
|
For a full understanding of Honcho I recommend the great [[ARCHIVED; A Simple Honcho Primer|Simple Honcho Primer]] post, but I'll try to summarize the important details here.
|
||||||
|
|
||||||
For a full understanding of Honcho I recommend the great [[A Simple Honcho
|
|
||||||
Primer|Simple Honcho
|
|
||||||
Primer]] post, but I'll
|
|
||||||
try to summarize the important details here.
|
|
||||||
|
|
||||||
Honcho is a personalization platform for LLM applications. It is infrastructure
|
Honcho is a personalization platform for LLM applications. It is infrastructure
|
||||||
that developers can use for storing data related to their applications, deriving
|
that developers can use for storing data related to their applications, deriving
|
||||||
@ -82,9 +81,7 @@ session = user.create_session()
|
|||||||
|
|
||||||
There is an Async version of the SDK with an `AsyncHoncho` class that uses
|
There is an Async version of the SDK with an `AsyncHoncho` class that uses
|
||||||
objects such as `AsyncSession` and `AsyncUser`.
|
objects such as `AsyncSession` and `AsyncUser`.
|
||||||
|
# Guiding Questions
|
||||||
## Guiding Questions
|
|
||||||
|
|
||||||
Before evaluating the below platforms I wanted to investigate a few questions I
|
Before evaluating the below platforms I wanted to investigate a few questions I
|
||||||
had about how to design SDKs and how they are generally maintained in other
|
had about how to design SDKs and how they are generally maintained in other
|
||||||
organizations. I've also included some questions I want to think about when
|
organizations. I've also included some questions I want to think about when
|
||||||
@ -107,9 +104,7 @@ Platform Specific Questions
|
|||||||
3. How easy was it to use the tool?
|
3. How easy was it to use the tool?
|
||||||
4. What approach does the tool take? Object-oriented or singleton?
|
4. What approach does the tool take? Object-oriented or singleton?
|
||||||
5. How does it handle async vs sync interfaces?
|
5. How does it handle async vs sync interfaces?
|
||||||
|
# Research
|
||||||
## Research
|
|
||||||
|
|
||||||
> First I took a look at sources and posts onlines that talk in general about
|
> First I took a look at sources and posts onlines that talk in general about
|
||||||
> developing SDKs. This isn't an exhaustive look at every link I looked at, but
|
> developing SDKs. This isn't an exhaustive look at every link I looked at, but
|
||||||
> ones I thought were relevant. The notes are messy and not necessarily fully
|
> ones I thought were relevant. The notes are messy and not necessarily fully
|
||||||
@ -173,8 +168,7 @@ the end.
|
|||||||
|
|
||||||
At the time of this research there was no follow-up post.
|
At the time of this research there was no follow-up post.
|
||||||
|
|
||||||
[Ask HN: Best practices (and examples) for designing client libraries for
|
[Ask HN: Best practices (and examples) for designing client libraries for APIs?](https://news.ycombinator.com/item?id=23283551)
|
||||||
APIs?](https://news.ycombinator.com/item?id=23283551)
|
|
||||||
|
|
||||||
The first comment actually advocates for an object-oriented model but just using
|
The first comment actually advocates for an object-oriented model but just using
|
||||||
the top level client object for authentication and setup stuff.
|
the top level client object for authentication and setup stuff.
|
||||||
@ -298,16 +292,13 @@ Some key insights
|
|||||||
- Have modular design patterns that make it easy to extend and pick and choose
|
- Have modular design patterns that make it easy to extend and pick and choose
|
||||||
features.
|
features.
|
||||||
|
|
||||||
[Should I implement OOP in a REST
|
[Should I implement OOP in a REST API?](https://www.reddit.com/r/flask/comments/1755ob0/should_i_implement_oop_in_a_rest_api/)
|
||||||
API?](https://www.reddit.com/r/flask/comments/1755ob0/should_i_implement_oop_in_a_rest_api/)
|
|
||||||
|
|
||||||
Most people seem to be saying a full OOP method is overkill, but there are
|
Most people seem to be saying a full OOP method is overkill, but there are
|
||||||
people advocating for having a controller class with methods that take data
|
people advocating for having a controller class with methods that take data
|
||||||
objects as inputs. Essentially advocating for the singleton approach with data
|
objects as inputs. Essentially advocating for the singleton approach with data
|
||||||
only objects.
|
only objects.
|
||||||
|
## Analysis
|
||||||
### Analysis
|
|
||||||
|
|
||||||
Many of the generic concerns of SDK design do not have to do with the UX of the
|
Many of the generic concerns of SDK design do not have to do with the UX of the
|
||||||
SDK for the end developer, rather background processes that an SDK handle. This
|
SDK for the end developer, rather background processes that an SDK handle. This
|
||||||
includes:
|
includes:
|
||||||
@ -339,18 +330,12 @@ but the object-oriented approach may not be a readable, and it could be unclear
|
|||||||
what methods are doing in complex codebases. Even GPT-4 couldn't decide between
|
what methods are doing in complex codebases. Even GPT-4 couldn't decide between
|
||||||
the two.
|
the two.
|
||||||
|
|
||||||

|
||||||
Approaches](/assets/sdk-gpt-4.png)
|
|
||||||
|
|
||||||
Again and again, the best way to approach SDK development is to just do whatever
|
Again and again, the best way to approach SDK development is to just do whatever
|
||||||
is easier, and create tons of documentation that will help developers navigate
|
is easier, and create tons of documentation that will help developers navigate
|
||||||
your [API Ladder](https://blog.sbensu.com/posts/apis-as-ladders/). Someone will
|
your [API Ladder](https://blog.sbensu.com/posts/apis-as-ladders/). Someone will get confused regardless of what you do, so the key is to make sure the SDK makes sense (even if it's not the most efficient or clean) and remove hurdles for users to navigate errors and mistakes.
|
||||||
get confused regardless of what you do, so the key is to make sure the SDK makes
|
# SDK Generation Platforms
|
||||||
sense (even if it's not the most efficient or clean) and remove hurdles for
|
|
||||||
users to navigate errors and mistakes.
|
|
||||||
|
|
||||||
## SDK Generation Platforms
|
|
||||||
|
|
||||||
With a sense of the best standards for SDK design and additional features that
|
With a sense of the best standards for SDK design and additional features that
|
||||||
should be supported in the SDK I want to look at a few different options to
|
should be supported in the SDK I want to look at a few different options to
|
||||||
determine what is the best solution to go with.
|
determine what is the best solution to go with.
|
||||||
@ -364,9 +349,7 @@ Below is a list of the different platforms I wanted to review
|
|||||||
|
|
||||||
I was using the OpenAPI Spec for Honcho that was housed at
|
I was using the OpenAPI Spec for Honcho that was housed at
|
||||||
https://demo.honcho.dev/openapi.json.
|
https://demo.honcho.dev/openapi.json.
|
||||||
|
## Stainless
|
||||||
### Stainless
|
|
||||||
|
|
||||||
Since the hacker news thread for the release of stainless is what spurred this
|
Since the hacker news thread for the release of stainless is what spurred this
|
||||||
research I decided to try them out first.
|
research I decided to try them out first.
|
||||||
|
|
||||||
@ -381,9 +364,7 @@ of the interface. There was also built-in capabilities for retries, pagination,
|
|||||||
and auth.
|
and auth.
|
||||||
|
|
||||||
There's also capability for adding custom code such as utility functions.
|
There's also capability for adding custom code such as utility functions.
|
||||||
|
## Speakeasy
|
||||||
### Speakeasy
|
|
||||||
|
|
||||||
Speakeasy required me to do everything locally through their `brew` package. It
|
Speakeasy required me to do everything locally through their `brew` package. It
|
||||||
did not immediately accept the OpenAPI Spec and required me to make some tweaks.
|
did not immediately accept the OpenAPI Spec and required me to make some tweaks.
|
||||||
These were low-hanging fruit, and their cli has a handy AI tool that will
|
These were low-hanging fruit, and their cli has a handy AI tool that will
|
||||||
@ -397,9 +378,7 @@ The generated SDK didn't feel as strong as the stainless one. There didn't seem
|
|||||||
to support `async` methods, it did not use `pydantic` and used the built-in
|
to support `async` methods, it did not use `pydantic` and used the built-in
|
||||||
Python `@dataclass`. The methods had really unwieldy names, and looked like it
|
Python `@dataclass`. The methods had really unwieldy names, and looked like it
|
||||||
would need a lot of tweaking to get it more production ready.
|
would need a lot of tweaking to get it more production ready.
|
||||||
|
## Liblab
|
||||||
### Liblab
|
|
||||||
|
|
||||||
Also had me do the generation from the cli using their npm package. It was
|
Also had me do the generation from the cli using their npm package. It was
|
||||||
pretty straightforward to login and give it an API spec. Liblab seems to require
|
pretty straightforward to login and give it an API spec. Liblab seems to require
|
||||||
a lot tweaking to get better results. It gave me several warnings asking me to
|
a lot tweaking to get better results. It gave me several warnings asking me to
|
||||||
@ -414,8 +393,7 @@ which seems to be the industry standard for codegen tools. The method names
|
|||||||
were also unwieldy. It also didn't make use of pydantic and instead implemented
|
were also unwieldy. It also didn't make use of pydantic and instead implemented
|
||||||
its own `BaseModel` class. It was built on the `requests` model and doesn't seem
|
its own `BaseModel` class. It was built on the `requests` model and doesn't seem
|
||||||
to support `async` methods.
|
to support `async` methods.
|
||||||
|
## OpenAPI Generator
|
||||||
### OpenAPI Generator
|
|
||||||
|
|
||||||
This is the only one on the list that is not expressly backed by a company
|
This is the only one on the list that is not expressly backed by a company
|
||||||
whose main goal is SDK generation. It is however a very popular project with
|
whose main goal is SDK generation. It is however a very popular project with
|
||||||
@ -435,9 +413,7 @@ Once again, the sdk use the `singleton` approach.
|
|||||||
|
|
||||||
I also did not see any indication of functionality for retry logic,
|
I also did not see any indication of functionality for retry logic,
|
||||||
authentication, or pagination.
|
authentication, or pagination.
|
||||||
|
## Conclusion
|
||||||
### Conclusion
|
|
||||||
|
|
||||||
Overall, Stainless had the results that I liked the most. With almost no work
|
Overall, Stainless had the results that I liked the most. With almost no work
|
||||||
from me, it produced a high quality SDK that designed things in a sensible way
|
from me, it produced a high quality SDK that designed things in a sensible way
|
||||||
with many built-in features such as retries, pagination, and auth.
|
with many built-in features such as retries, pagination, and auth.
|
||||||
@ -459,9 +435,7 @@ What I'm looking for right now is the platform or tool that can reduce my work
|
|||||||
the most and let me focus on other things and stainless achieved that. The
|
the most and let me focus on other things and stainless achieved that. The
|
||||||
results are not perfect, but it doesn't look like it'll need more than some
|
results are not perfect, but it doesn't look like it'll need more than some
|
||||||
slight tweaking and testing to get to a state I want.
|
slight tweaking and testing to get to a state I want.
|
||||||
|
# Results
|
||||||
## Results
|
|
||||||
|
|
||||||
After reaching the conclusion in the previous section, I took some time to fully
|
After reaching the conclusion in the previous section, I took some time to fully
|
||||||
implement Stainless to make SDKs for Honcho and am proud to announce the release
|
implement Stainless to make SDKs for Honcho and am proud to announce the release
|
||||||
of a new Python SDK, and the launch of a brand-new NodeJS SDK.
|
of a new Python SDK, and the launch of a brand-new NodeJS SDK.
|
||||||
@ -1,5 +1,5 @@
|
|||||||
---
|
---
|
||||||
title: "Honcho: User Context Management for LLM Apps"
|
title: "ARCHIVED: Honcho: User Context Management for LLM Apps"
|
||||||
enableToc: true
|
enableToc: true
|
||||||
date: 01.18.24
|
date: 01.18.24
|
||||||
tags:
|
tags:
|
||||||
@ -8,34 +8,45 @@ tags:
|
|||||||
- philosophy
|
- philosophy
|
||||||
- ml
|
- ml
|
||||||
- announcements
|
- announcements
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: Introducing Honcho, an open-source user context management framework for LLM applications that enables personalized, user-first AI experiences at scale.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This is the [Honcho](https://honcho.dev) origin story--our first public announcement of the project.
|
||||||
|
>
|
||||||
|
> We first pitched it as "an open-source version of the OpenAI Assistants API" for managing AI app data on a per-user basis. The architecture described here has evolved into Honcho's current "[[Beyond the User-Assistant Paradigm; Introducing Peers|peer paradigm]]," which unifies users & AI agents as Peers & supports much more sophisticated memory, continual learning, & [[Memory as Reasoning|powerful reasoning]].
|
||||||
|
>
|
||||||
|
> But this post also captures Honcho's founding vision: that the "missing piece of the stack" was user context, that LLMs are uniquely suited to get to know users in ways traditional software couldn't, & that personalization would be table stakes for AI apps.
|
||||||
|
>
|
||||||
|
> If you want to understand where Honcho came from & why we built it, start here.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
![[missing_piece.png]]
|
![[missing_piece.png]]
|
||||||
*The missing piece of the stack*
|
*The missing piece of the stack*
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*Today we drop the first release of a project called [Honcho](https://github.com/plastic-labs/honcho/tree/main), an open-source version of the OpenAI Assistants API. Honcho manages your AI app data on a per-user basis, allowing for multiple concurrent sessions. Glaringly absent from the existing stack, Honcho will, at full maturity, usher the advent of atomic, disposable agents that are user-first by default.*
|
||||||
Today we drop the first release of a project called [*Honcho*](https://github.com/plastic-labs/honcho/tree/main), an open-source version of the OpenAI Assistants API. Honcho manages your AI app data on a per-user basis, allowing for multiple concurrent sessions. Glaringly absent from the existing stack, Honcho will, at full maturity, usher the advent of atomic, disposable agents that are user-first by default.
|
# Plastic Lore
|
||||||
|
|
||||||
## Plastic Lore
|
|
||||||
|
|
||||||
[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. ^09f185
|
[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology. Our first cycle focused on how the incentive mechanisms and data availability made possible by distributed ledgers might be harnessed to improve learning outcomes. But with the advent of ChatGPT and a chorus of armchair educators proclaiming tutoring solved by the first nascent consumer generative AI, we shifted our focus to large language models. ^09f185
|
||||||
|
|
||||||
As a team with with backgrounds in both machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and still are 1-to-many instructors. Yes, they herald the beginning of a revolution in personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short.
|
As a team with with backgrounds in both machine learning and education, we found the prevailing narratives overestimating short-term capabilities and under-imagining longterm potential. Fundamentally, LLMs were and still are 1-to-many instructors. Yes, they herald the beginning of a revolution in personal access not to be discounted, but every student is still ultimately getting the same experience. And homogenized educational paradigms are by definition under-performant on an individual level. If we stop here, we're selling ourselves short.
|
||||||
|
|
||||||
![[zombie_tutor_prompt.jpg]]
|
![[zombie_tutor_prompt.jpg]]
|
||||||
*A well intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).* ^dfae31
|
*A well-intentioned but monstrously deterministic [tutor prompt](https://www.oneusefulthing.org/p/assigning-ai-seven-ways-of-using).* ^dfae31
|
||||||
|
|
||||||
Most edtech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing edtech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities.
|
Most EdTech projects we saw emerging actually made foundation models worse by adding gratuitous lobotomization and coercing deterministic behavior. The former stemmed from the typical misalignments plaguing EdTech, like the separation of user and payer. The latter seemed to originate with deep misunderstandings around what LLMs are and continues to translate to a huge missed opportunities.
|
||||||
|
|
||||||
So we set out to build a non-skeuomorphic, AI-native tutor that put users first. The same indeterminism so often viewed as LLMs' greatest liability is in fact their greatest strength. Really, it's what they _are_. When great teachers deliver effective personalized instruction, they don't consult some M.Ed flowchart, they leverage the internal personal context they have on the student and reason (consciously or basally) about the best pedagogical intervention. LLMs are the beginning of this kind of high-touch learning companion being _synthetically_ possible.
|
So we set out to build a non-skeuomorphic, AI-native tutor that put users first. The same indeterminism so often viewed as LLMs' greatest liability is in fact their greatest strength. Really, it's what they _are_. When great teachers deliver effective personalized instruction, they don't consult some M.Ed flowchart, they leverage the internal personal context they have on the student and reason (consciously or basally) about the best pedagogical intervention. LLMs are the beginning of this kind of high-touch learning companion being _synthetically_ possible.
|
||||||
|
|
||||||
![[teacher_shoggoth.png]]
|
![[teacher_shoggoth.png]]
|
||||||
*We're not so different after all ([@anthrupad](https://twitter.com/anthrupad)).*
|
*We're not so different after all ([@anthrupad](https://twitter.com/anthrupad)).*
|
||||||
|
|
||||||
Our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
|
Our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
|
||||||
|
# Context Failure Mode
|
||||||
## Context Failure Mode
|
But we quickly ran up against a hard limitation. The failure mode we believe all vertical-specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student.
|
||||||
|
|
||||||
But we quickly ran up against a hard limitation. The failure mode we believe all vertical specific AI applications will eventually hit if they want to be sticky, paradigmatically different than their deterministic counterparts, and realize the latent potential. That's context, specifically user context--Bloom didn't know enough about each student.
|
|
||||||
|
|
||||||
We're consistently blown away by how many people don't realize large language models themselves are stateless. They don't remember shit about you. They're just translating context they're given into probable sequences of tokens. LLMs are like horoscope writers, good at crafting general statements that *feel* very personal. You would be too, if you'd ingested and compressed that much of the written human corpus.
|
We're consistently blown away by how many people don't realize large language models themselves are stateless. They don't remember shit about you. They're just translating context they're given into probable sequences of tokens. LLMs are like horoscope writers, good at crafting general statements that *feel* very personal. You would be too, if you'd ingested and compressed that much of the written human corpus.
|
||||||
|
|
||||||
@ -53,9 +64,7 @@ The real magic of 1:1 instruction isn't subject matter expertise. Bloom and the
|
|||||||
Large language models can be good at this too. With similar compression and generation abilities, they're uniquely suited (among existing technology) to get to know you. We really can have shared culture and relationships with LLMs, absent (if we like) any cringy anthropomorphism.
|
Large language models can be good at this too. With similar compression and generation abilities, they're uniquely suited (among existing technology) to get to know you. We really can have shared culture and relationships with LLMs, absent (if we like) any cringy anthropomorphism.
|
||||||
|
|
||||||
Bloom needed a mechanism to harvest and utilize more context about the student. So we built it one.
|
Bloom needed a mechanism to harvest and utilize more context about the student. So we built it one.
|
||||||
|
# Research Solutions
|
||||||
## Research Solutions
|
|
||||||
|
|
||||||
Prediction algorithms have become phenomenal at hacking attention using tabular engagement and activity data. But if we're thinking LLM-natively, a few questions emerge:
|
Prediction algorithms have become phenomenal at hacking attention using tabular engagement and activity data. But if we're thinking LLM-natively, a few questions emerge:
|
||||||
|
|
||||||
1. How are LLMs uniquely positioned to understand users?
|
1. How are LLMs uniquely positioned to understand users?
|
||||||
@ -75,21 +84,16 @@ Late last year we published a [research pre-print on this topic](https://arxiv.o
|
|||||||
*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared), from our research.*
|
*A [predictive coding inspired metacognitive architecture](https://youtu.be/PbuzqCdY0hg?feature=shared), from our research.*
|
||||||
|
|
||||||
We added it to Bloom and found the missing piece to overcoming the failure mode of user context. Our tutor could now learn about the student and use that knowledge effectively to produce better learning outcomes.
|
We added it to Bloom and found the missing piece to overcoming the failure mode of user context. Our tutor could now learn about the student and use that knowledge effectively to produce better learning outcomes.
|
||||||
|
# Blast Horizon
|
||||||
## Blast Horizon
|
Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple long-form sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential.
|
||||||
|
|
||||||
Building and maintaining a production-grade AI app for learning catapulted us to this missing part of the stack. Lots of users, all growing in unique ways, all needing personalized attention that evolved over multiple longform sessions, forced us to confront the user context management problem with all it's thorny intricacy and potential.
|
|
||||||
|
|
||||||
And we're hearing constantly from builders of other vertical specific AI apps that personalization is the key blocker. In order for projects to graduate form toys to tools, they need to create new kinds of magic for their users. Mountains of mostly static software exists to help accomplish an unfathomable range of tasks and lots of it can be personalized using traditional (albeit laborious for the user) methods. But LLMs can observe, reason, then generate the software _and the user context_, all abstracted away behind the scenes.
|
And we're hearing constantly from builders of other vertical specific AI apps that personalization is the key blocker. In order for projects to graduate form toys to tools, they need to create new kinds of magic for their users. Mountains of mostly static software exists to help accomplish an unfathomable range of tasks and lots of it can be personalized using traditional (albeit laborious for the user) methods. But LLMs can observe, reason, then generate the software _and the user context_, all abstracted away behind the scenes.
|
||||||
|
|
||||||
Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family, without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree.
|
Imagine online stores generated just in time for the home improvement project you're working on; generative games with rich multimodality unfolding to fit your mood on the fly; travel agents that know itinerary needs specific to your family, without being explicitly told; copilots that think and write and code not just like you, _but as you_; disposable, atomic agents with full personal context that replace your professional services--_you_ with a law, medical, accounting degree.
|
||||||
|
|
||||||
This is the kind of future we can build when we put users at the center of our agent and LLM app production.
|
This is the kind of future we can build when we put users at the center of our agent and LLM app production.
|
||||||
|
# Introducing Honcho
|
||||||
## Introducing Honcho
|
|
||||||
|
|
||||||
^a9d0f8
|
^a9d0f8
|
||||||
|
|
||||||
So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. ^8c982b
|
So today we're releasing the first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. ^8c982b
|
||||||
|
|
||||||
Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).
|
Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).
|
||||||
@ -98,12 +102,10 @@ Honcho is a REST API that defines a storage schema to seamlessly manage your app
|
|||||||
|
|
||||||
We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably taking it down because of infrastructure/scaling issues.
|
We spent lots of time building the infrastructure to support multiple concurrent users with Bloom, and too often we see developers running into the same problem: building a fantastic demo, sharing it with the world, then inevitably taking it down because of infrastructure/scaling issues.
|
||||||
|
|
||||||
Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speedrunning to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.
|
Honcho allows you to deploy an application with a single command that can automatically handle concurrent users. Speed-running to production is now only limited by the amount of spend you can handle, not tedious infrastructure setup.
|
||||||
|
|
||||||
Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across an ecosystem of agents.
|
Managing app data on a per-user basis is the first small step in improving how devs build LLM apps. Once you define a data management schema on a per-user basis, a lots of new possibilities emerge around what you can do intra-user message, intra-user sessions, and even intra-user sessions across an ecosystem of agents.
|
||||||
|
# Get Involved
|
||||||
## Get Involved
|
|
||||||
|
|
||||||
We're excited to see builders experiment with what we're releasing today, and with Honcho as it continues to evolve.
|
We're excited to see builders experiment with what we're releasing today, and with Honcho as it continues to evolve.
|
||||||
|
|
||||||
Check out the [GitHub repo](https://github.com/plastic-labs/honcho) to get started and join our [Discord](https://discord.gg/plasticlabs) to stay up to date 🫡.
|
Check out the [GitHub repo](https://github.com/plastic-labs/honcho) to get started and join our [Discord](https://discord.gg/plasticlabs) to stay up to date 🫡.
|
||||||
@ -1,26 +1,33 @@
|
|||||||
---
|
---
|
||||||
title: Introducing Honcho's Dialectic API
|
title: "ARCHIVED: Introducing Honcho's Dialectic API"
|
||||||
date: 03.26.24
|
date: 03.26.24
|
||||||
tags:
|
tags:
|
||||||
- dev
|
- dev
|
||||||
- ml
|
- ml
|
||||||
- announcements
|
- announcements
|
||||||
- blog
|
- blog
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer, Vince Trost, & Vineeth Voruganti
|
||||||
|
description: Announcing the Dialectic API--an LLM-native endpoint enabling agent-to-agent chat in natural language for dynamic user personalization.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post announced Honcho's Dialectic API--an LLM-native endpoint for just-in-time agent-to-agent context queries in natural language. This endpoint has since evolved into the much more powerful `.chat` method in Honcho today. The Dialectic API was ahead of its time, and its successor remains state-of-the-art.
|
||||||
|
>
|
||||||
|
> Here we lay out the reasoning behind the development of this feature. We get into the case for natural language as a substrate for agent coordination, the argument that rigid API specs constrain what's now possible, & a vision of agents collaboratively reasoning about how to personalize UX--all thinking that's shaped everything we've built since.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
![[agent_dialectics.jpeg]]
|
![[agent_dialectics.jpeg]]
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*Our [Dialectic API](https://docs.honcho.dev/guides/dialectic-endpoint) is an LLM-native way for your AI application to discuss user context with Honcho. It allows for direct LLM-to-LLM communication in natural language.*
|
||||||
|
|
||||||
Our [Dialectic API](https://docs.honcho.dev/guides/dialectic-endpoint) is an LLM-native way for your AI application to discuss user context with Honcho. It allows for direct LLM-to-LLM communication in natural language.
|
*Agents need ways to interface dynamically and autonomously, free from the rigidness of traditional APIs. We're building that substrate.*
|
||||||
|
# What's a Dialectic API?
|
||||||
Agents need ways to interface dynamically and autonomously, free from the rigidness of traditional APIs. We're building that substrate.
|
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[ARCHIVED; Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
|
||||||
|
## How It Works
|
||||||
## What's a Dialectic API?
|
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[ARCHIVED; User State is State of the Art|each specific user]].
|
||||||
|
|
||||||
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
|
|
||||||
|
|
||||||
### How It Works
|
|
||||||
|
|
||||||
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[User State is State of the Art|each specific user]].
|
|
||||||
|
|
||||||
That's when your agent asks Honcho for what it needs in natural language. This query can take many forms. Some possibilities:
|
That's when your agent asks Honcho for what it needs in natural language. This query can take many forms. Some possibilities:
|
||||||
|
|
||||||
@ -36,23 +43,19 @@ That's when your agent asks Honcho for what it needs in natural language. This q
|
|||||||
- A static fact about user identity
|
- A static fact about user identity
|
||||||
- A piece of user data to use in improving your app's overall vertical or user-specific service
|
- A piece of user data to use in improving your app's overall vertical or user-specific service
|
||||||
|
|
||||||
Key to note here is the ability to hard code the most useful type of Honcho query for your app's use case *or*--better yet--to [[Extrusion 02.24|trust your agent]] to reason autonomously about what it needs based upon the current session (or any other criteria) and feed that to Honcho. Or run a hybrid approach. This can be done synchronously with an inference/session or async as needed.
|
Key to note here is the ability to hard code the most useful type of Honcho query for your app's use case *or*--better yet--to [[On intellectual respect|trust your agent]] to reason autonomously about what it needs based upon the current session (or any other criteria) and feed that to Honcho. Or run a hybrid approach. This can be done synchronously with an inference/session or async as needed.
|
||||||
|
|
||||||
In this way, Honcho becomes an self-improving oracle to the identity of each and every one of your app's users. Any agent can chat with a representation of a user (as Honcho) on the backend.
|
In this way, Honcho becomes an self-improving oracle to the identity of each and every one of your app's users. Any agent can chat with a representation of a user (as Honcho) on the backend.
|
||||||
|
|
||||||
Honcho responds to queries in the same format--natural language. Most simply, this is just a conversation between two agents, *collaboratively* reasoning about the best way to personalize UX. Agent-to-agent chat over users.
|
Honcho responds to queries in the same format--natural language. Most simply, this is just a conversation between two agents, *collaboratively* reasoning about the best way to personalize UX. Agent-to-agent chat over users.
|
||||||
|
|
||||||
In the coming weeks, we'll release a number of off the shelf options to plug into any cognitive architecture and demos to illustrate more custom utility. We expect to see (and are already seeing in [our private beta](https://plasticlabs.typeform.com/honchobeta)) lots of novel ways to prompt Honcho effectively.
|
In the coming weeks, we'll release a number of off-the-shelf options to plug into any cognitive architecture and demos to illustrate more custom utility. We expect to see (and are already seeing in [our private beta](https://plasticlabs.typeform.com/honchobeta)) many novel ways to prompt Honcho effectively.
|
||||||
|
## Why We Built It
|
||||||
### Why We Built It
|
|
||||||
|
|
||||||
Why is a dialectic API the right way to solve the problem of user context in LLM applications?
|
Why is a dialectic API the right way to solve the problem of user context in LLM applications?
|
||||||
|
|
||||||
Not only is it ideal from a development and design perspective, it's optimal for the particular task of personal context and user identity.
|
Not only is it ideal from a development and design perspective, it's optimal for the particular task of personal context and user identity.
|
||||||
|
### The DevEx Case
|
||||||
#### The DevEx Case
|
|
||||||
^a14c2f
|
^a14c2f
|
||||||
|
|
||||||
Our Dialectic API is single endpoint for everything personalization.
|
Our Dialectic API is single endpoint for everything personalization.
|
||||||
|
|
||||||
It reduces development overhead and allows you to get a personalized application running quickly and efficiently--speedrunning to production.
|
It reduces development overhead and allows you to get a personalized application running quickly and efficiently--speedrunning to production.
|
||||||
@ -62,32 +65,24 @@ For most AI apps, personalization will be a key differentiator between your agen
|
|||||||
Further, when agents can communicate directly using natural language, there's no need to learn and manage complicated API specification. Or for us to build it. Since LLMs are proficient at interpreting the intricacies of natural language, there's a functionally infinite number of ways to ask Honcho a question and get a satisfactory result. Far superior to brittle and strict legacy APIs.
|
Further, when agents can communicate directly using natural language, there's no need to learn and manage complicated API specification. Or for us to build it. Since LLMs are proficient at interpreting the intricacies of natural language, there's a functionally infinite number of ways to ask Honcho a question and get a satisfactory result. Far superior to brittle and strict legacy APIs.
|
||||||
|
|
||||||
However, this doesn't mean the developer now needs to be a prompting expert, fluent in all its esoterica. Honcho is an expert in personal context and theory of mind reasoning, so your prompts can be adaptive and ad hoc, and Honcho will figure out the rest. When you're ready, you can even offload the queries to your app-side LLM.
|
However, this doesn't mean the developer now needs to be a prompting expert, fluent in all its esoterica. Honcho is an expert in personal context and theory of mind reasoning, so your prompts can be adaptive and ad hoc, and Honcho will figure out the rest. When you're ready, you can even offload the queries to your app-side LLM.
|
||||||
|
### The ML Case
|
||||||
#### The ML Case
|
|
||||||
^x7f7f8
|
^x7f7f8
|
||||||
|
|
||||||
Extra context improves user response generation, the more specific, the better. Focus on ML to crush your vertical, let Honcho personalize it by default.
|
Extra context improves user response generation, the more specific, the better. Focus on ML to crush your vertical, let Honcho personalize it by default.
|
||||||
|
#### Leverage Natural Language Plasticity
|
||||||
##### Leverage Natural Language Plasticity
|
Each user has a [[ARCHIVED; User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of your app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
|
||||||
|
|
||||||
Each user has a [[User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of you app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
|
|
||||||
|
|
||||||
But that complexity is hard to capture and needlessly constrained with typical API design. In order to express the nuance of personal context, we need the high variance, dynamic nature of natural language.
|
But that complexity is hard to capture and needlessly constrained with typical API design. In order to express the nuance of personal context, we need the high variance, dynamic nature of natural language.
|
||||||
|
|
||||||
Because LLMs consider tokens in relation to a vast [[LLMs excel at theory of mind because they read|human narrative space]], we're much closer to *semantic* machine understanding than ever. Personal context allows you to target parts of the latent space most useful in generating tokens for specific users in specific settings. The only way we know to communicate and leverage that depth is with the inherent diversity of natural language...which is itself evolutionarily optimized to describe human identity well.
|
Because LLMs consider tokens in relation to a vast [[LLMs excel at theory of mind because they read|human narrative space]], we're much closer to *semantic* machine understanding than ever. Personal context allows you to target parts of the latent space most useful in generating tokens for specific users in specific settings. The only way we know to communicate and leverage that depth is with the inherent diversity of natural language...which is itself evolutionarily optimized to describe human identity well.
|
||||||
|
|
||||||
Way richer than running RAG over a vector store of session logs. Or stateless CRUD-inspired API spec.
|
Way richer than running RAG over a vector store of session logs. Or stateless CRUD-inspired API spec.
|
||||||
|
#### Out-Compete Foundation Models
|
||||||
##### Out-Compete Foundation Models
|
|
||||||
|
|
||||||
Honcho's Dialectic API also allows you to build training examples with rich theory of mind context. Those datasets can help you outperform foundation models in your specific vertical and its set of tasks.
|
Honcho's Dialectic API also allows you to build training examples with rich theory of mind context. Those datasets can help you outperform foundation models in your specific vertical and its set of tasks.
|
||||||
|
|
||||||
By adding additional context to inputs, the distribution of responses your model samples from can be improved. Any sort of "reasoning" the language model exhibits in a single inference is due to learned patterns in the dataset. So if you can create examples that can help it learn better patterns, you can improve the "reasoning" steps it exhibits.
|
By adding additional context to inputs, the distribution of responses your model samples from can be improved. Any sort of "reasoning" the language model exhibits in a single inference is due to learned patterns in the dataset. So if you can create examples that can help it learn better patterns, you can improve the "reasoning" steps it exhibits.
|
||||||
|
|
||||||
Ultimately, we're learning ways of responding that foundation models won't. Using theory of mind context yields more specific examples, which allows more robust domain-specific training.
|
Ultimately, we're learning ways of responding that foundation models won't. Using theory of mind context yields more specific examples, which allows more robust domain-specific training.
|
||||||
|
## Why "Dialectic"?
|
||||||
### Why "Dialectic"?
|
|
||||||
|
|
||||||
In the classical sense, a *dialectic* process is one where two parties seek to arrive at the truth via reasoned dialogue.
|
In the classical sense, a *dialectic* process is one where two parties seek to arrive at the truth via reasoned dialogue.
|
||||||
|
|
||||||
(In our case, the truth is a solution for delivering the optimal per-app, per-user, per-session experience.)
|
(In our case, the truth is a solution for delivering the optimal per-app, per-user, per-session experience.)
|
||||||
@ -95,9 +90,7 @@ In the classical sense, a *dialectic* process is one where two parties seek to a
|
|||||||
We've termed our API this way because not only is it communication between software systems, but it's a reasoned discourse between agents to reach the ideal conclusion.
|
We've termed our API this way because not only is it communication between software systems, but it's a reasoned discourse between agents to reach the ideal conclusion.
|
||||||
|
|
||||||
Each agent has a different set of information, the free discussion allows them to eliminate that asymmetry and arrive at a synthesis greater than its parts. One agent is expert in delivering a service in its vertical, the other in modeling user identity and surfacing relevant, timely context based on that representation.
|
Each agent has a different set of information, the free discussion allows them to eliminate that asymmetry and arrive at a synthesis greater than its parts. One agent is expert in delivering a service in its vertical, the other in modeling user identity and surfacing relevant, timely context based on that representation.
|
||||||
|
# The Agentic Substrate
|
||||||
## The Agentic Substrate
|
|
||||||
|
|
||||||
Our Dialectic API is part of an evolutionary lineage. One that records humanity's slow discovery of all the ways machines can communicate with one another--from telegraph and punch cards to REST and GraphQL. Along each axis of typical machine comm improvement, agent-to-agent dialectics offer advantages:
|
Our Dialectic API is part of an evolutionary lineage. One that records humanity's slow discovery of all the ways machines can communicate with one another--from telegraph and punch cards to REST and GraphQL. Along each axis of typical machine comm improvement, agent-to-agent dialectics offer advantages:
|
||||||
|
|
||||||
- **Speed** - user time to value can be optimized with granular personal context requests
|
- **Speed** - user time to value can be optimized with granular personal context requests
|
||||||
@ -109,22 +102,18 @@ Our Dialectic API is part of an evolutionary lineage. One that records humanity'
|
|||||||
|
|
||||||
As the commodification of inference and intelligence is coupled with growing general foundation model capability, application developers will naturally be pushed toward greater and greater vertical specificity. This will drive the development of increasingly atomic agents, ones who excel at a very narrow tasks.
|
As the commodification of inference and intelligence is coupled with growing general foundation model capability, application developers will naturally be pushed toward greater and greater vertical specificity. This will drive the development of increasingly atomic agents, ones who excel at a very narrow tasks.
|
||||||
|
|
||||||
This explosion of such agent microservices, will have to include the evolution of systems for agent-agent communication and transaction. If agents are going to collaborate and get shit done for us, they need native ways to communicate. Beautifully, LLMs share with us and among themselves the universal interface of natural language.
|
This explosion of such agent micro-services, will have to include the evolution of systems for agent-agent communication and transaction. If agents are going to collaborate and get shit done for us, they need native ways to communicate. Beautifully, LLMs share with us and among themselves the universal interface of natural language.
|
||||||
|
|
||||||
We can leverage this substrate for agent coordination with more depth and nuance than fragile trad API design. Doubtless, categories of agents will find more efficient symbol structures for cooperation in specific, repetitive cases. But discourse in natural language remains always available as a rich foundational protocol. And as we've explored, it's the ideal starting place for transmitting insights about human identity.
|
We can leverage this substrate for agent coordination with more depth and nuance than fragile trad API design. Doubtless, categories of agents will find more efficient symbol structures for cooperation in specific, repetitive cases. But discourse in natural language always remains available as a rich foundational protocol. And as we've explored, it's the ideal starting place for transmitting insights about human identity.
|
||||||
|
|
||||||
This is just the start. Just like you can appendage memory and tools to an LLM, we can augment this substrate in a number of ways--from designing multi-party protocols, to enabling zero knowledge or confidential environments, or recording transactional data on blockchains or other types of public or private immutable ledgers.
|
This is just the start. Just like you can appendage memory and tools to an LLM, we can augment this substrate in a number of ways--from designing multi-party protocols, to enabling zero knowledge or confidential environments, or recording transactional data on blockchains or other types of public or private immutable ledgers.
|
||||||
|
|
||||||
That kind of richness puts us one step closer to the dream of a semantic web, one as replete with meaning as the physical world *and* machine grokkable. What *matters* to me can be used to personalize an atomic agent *just in time*, without sacrificing important context. Intelligent microservices can be more aligned with me than human economic actors and professional services, which are plagued with high-latency interest misalignment and information asymmetry.
|
That kind of richness puts us one step closer to the dream of a semantic web, one as replete with meaning as the physical world *and* machine grokable. What *matters* to me can be used to personalize an atomic agent *just in time*, without sacrificing important context. Intelligent micro-services can be more aligned with me than human economic actors and professional services, which are plagued with high-latency interest misalignment and information asymmetry.
|
||||||
|
|
||||||
Honcho and agent dialectics can eliminate the principal-agent problem for this new economic paradigm, digitally extending human agency and identity further than ever before.
|
Honcho and agent dialectics can eliminate the principal-agent problem for this new economic paradigm, digitally extending human agency and identity further than ever before.
|
||||||
|
# Private Beta
|
||||||
## Private Beta
|
|
||||||
|
|
||||||
Our Dialectic API is now available in private beta.
|
Our Dialectic API is now available in private beta.
|
||||||
|
|
||||||
We're working closely with a diverse array of projects across many different verticals in various stages of development--from ideation to production.
|
|
||||||
|
|
||||||
If you're excited build with a hosted version of Honcho and explore the ideas covered here, [sign-up for our waitlist](https://plasticlabs.typeform.com/honchobeta).
|
If you're excited build with a hosted version of Honcho and explore the ideas covered here, [sign-up for our waitlist](https://plasticlabs.typeform.com/honchobeta).
|
||||||
|
|
||||||
And in the meantime, [join our Discord](https://discord.gg/plasticlabs) and tell us what you're working on!
|
And in the meantime, [join our Discord](https://discord.gg/plasticlabs) and tell us what you're working on!
|
||||||
@ -1,5 +1,5 @@
|
|||||||
---
|
---
|
||||||
title: Memories for All
|
title: "ARCHIVED: Memories for All"
|
||||||
date: 02.15.24
|
date: 02.15.24
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
@ -7,28 +7,36 @@ tags:
|
|||||||
- announcements
|
- announcements
|
||||||
- philosophy
|
- philosophy
|
||||||
- ml
|
- ml
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer
|
||||||
|
description: An open-source reimplementation of OpenAI's memory features using Honcho, enabling any AI app to derive & store personal context about users.
|
||||||
---
|
---
|
||||||
## TL;DR
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
Personalization is the next frontier. OpenAI gets it:
|
>
|
||||||
|
> This post was our response to OpenAI announcing "memory" in ChatGPT--we built an open-source reimplementation using [Honcho](https://honcho.dev) to show anyone could add superior user memory to their apps. The specific LangChain patterns & code examples here are far outdated; Honcho is much more powerful & the architecture has matured significantly (dig in to that [here](https://docs.honcho.dev), [[Beyond the User-Assistant Paradigm; Introducing Peers|here]], & [[Memory as Reasoning|here]]).
|
||||||
|
>
|
||||||
|
> A key prediction discussed here turned out to be remarkable prescient: walled gardens will seek to lock user context inside their ecosystems, leaving independent developers & privacy-conscious users out in the cold. And we argued for generative personalization--letting LLMs autonomously decide what matters about users rather than rigidly prescribing it--another Plastic thesis that's winning out.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
# TL;DR
|
||||||
|
*Personalization is the next frontier. OpenAI gets it:*
|
||||||
|
|
||||||
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">We’re testing ChatGPT's ability to remember things you discuss to make future chats more helpful. <br><br>This feature is being rolled out to a small portion of Free and Plus users, and it's easy to turn on or off. <a href="https://t.co/1Tv355oa7V">https://t.co/1Tv355oa7V</a> <a href="https://t.co/BsFinBSTbs">pic.twitter.com/BsFinBSTbs</a></p>— OpenAI (@OpenAI) <a href="https://twitter.com/OpenAI/status/1757469997742666052?ref_src=twsrc%5Etfw">February 13, 2024</a></blockquote>
|
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">We’re testing ChatGPT's ability to remember things you discuss to make future chats more helpful. <br><br>This feature is being rolled out to a small portion of Free and Plus users, and it's easy to turn on or off. <a href="https://t.co/1Tv355oa7V">https://t.co/1Tv355oa7V</a> <a href="https://t.co/BsFinBSTbs">pic.twitter.com/BsFinBSTbs</a></p>— OpenAI (@OpenAI) <a href="https://twitter.com/OpenAI/status/1757469997742666052?ref_src=twsrc%5Etfw">February 13, 2024</a></blockquote>
|
||||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
||||||
|
|
||||||
Super exciting.
|
*Super exciting.*
|
||||||
|
|
||||||
But what about the rest of us?
|
*But what about the rest of us?*
|
||||||
|
|
||||||
Welp, we built an open source reimplementation of OpenAI's 'memory' features using [Honcho](https://honcho.dev) to effortlessly organize sessions on a per-user basis
|
*Welp, we built an open source reimplementation of OpenAI's 'memory' features using [Honcho](https://honcho.dev) to effortlessly organize sessions on a per-user basis .*
|
||||||
|
|
||||||
You can derive facts about users, store them, and retrieve for later use. And we're shipping a demo of this implemented with the useful abstractions LangChain provides.
|
*You can derive facts about users, store them, and retrieve for later use. And we're shipping a demo of this implemented with the useful abstractions LangChain provides.*
|
||||||
|
|
||||||
The user context rabbithole goes deep, this is still just the start.
|
*The user context rabbithole goes deep, this is still just the start.*
|
||||||
|
|
||||||
If you're building with or adjacent to Honcho, [join our Discord](https://discord.gg/plasticlabs), we'd love to help 🫡.
|
|
||||||
|
|
||||||
## OpenAI Memories
|
|
||||||
|
|
||||||
|
*If you're building with or adjacent to Honcho, [join our Discord](https://discord.gg/plasticlabs), we'd love to help 🫡.*
|
||||||
|
# OpenAI Memories
|
||||||
This week [OpenAI announced](https://openai.com/blog/memory-and-new-controls-for-chatgpt) they're testing memory in ChatGPT. Specifically this means learning about individual users in order to improve their experiences.
|
This week [OpenAI announced](https://openai.com/blog/memory-and-new-controls-for-chatgpt) they're testing memory in ChatGPT. Specifically this means learning about individual users in order to improve their experiences.
|
||||||
|
|
||||||
It's a limited initial rollout, closed under the hood, and rudimentary, but appears to include functionality for deriving facts about users from conversation history and storing those to augment later generation.
|
It's a limited initial rollout, closed under the hood, and rudimentary, but appears to include functionality for deriving facts about users from conversation history and storing those to augment later generation.
|
||||||
@ -38,9 +46,7 @@ There are features for users to view derived facts (memories), prune them, or tu
|
|||||||
They're betting, we believe correctly, that the real potential here is a wealth of agents whose behavior is in *high-fidelity with user identity*.
|
They're betting, we believe correctly, that the real potential here is a wealth of agents whose behavior is in *high-fidelity with user identity*.
|
||||||
|
|
||||||
We're pumped to see experiments like this taking place. But what if you're a developer that doesn't want to subscribe to this kind of platform dependency and all its attendant externalities? What if you're a user who wants independent or open source apps with a more mature version of these UX benefits?
|
We're pumped to see experiments like this taking place. But what if you're a developer that doesn't want to subscribe to this kind of platform dependency and all its attendant externalities? What if you're a user who wants independent or open source apps with a more mature version of these UX benefits?
|
||||||
|
# Context is Critical
|
||||||
## Context is Critical
|
|
||||||
|
|
||||||
At [Plastic Labs](https://plasticlabs.ai) our mission is to enable rich user memory in and across every application. Only then will we really understand just how augmentative and transformative these agents can be. We've been laser focused on this problem.
|
At [Plastic Labs](https://plasticlabs.ai) our mission is to enable rich user memory in and across every application. Only then will we really understand just how augmentative and transformative these agents can be. We've been laser focused on this problem.
|
||||||
|
|
||||||
![[laser_eyes_user_soapbox.png]]
|
![[laser_eyes_user_soapbox.png]]
|
||||||
@ -49,16 +55,13 @@ Right now, the vast majority of software UX is a 1-to-many experience. What you
|
|||||||
|
|
||||||
AI apps can deal *generatively* with each user on an individual basis, that is, an experience can be produced ad hoc for every user upon every interaction. From 1:many to 1:1 without prohibitive sacrifices in efficiency. But we're still underestimating the full scope of possibility here.
|
AI apps can deal *generatively* with each user on an individual basis, that is, an experience can be produced ad hoc for every user upon every interaction. From 1:many to 1:1 without prohibitive sacrifices in efficiency. But we're still underestimating the full scope of possibility here.
|
||||||
|
|
||||||
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
|
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[ARCHIVED; Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
|
||||||
|
|
||||||
Every agent interaction can be generated just in time for every person, informed by relevant personal context more substantive than human-to-human sessions. User context will enable disposable agents on the fly across verticals for lower marginal cost than 1:many software paradigms.
|
Every agent interaction can be generated just in time for every person, informed by relevant personal context more substantive than human-to-human sessions. User context will enable disposable agents on the fly across verticals for lower marginal cost than 1:many software paradigms.
|
||||||
|
|
||||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/tTE3xiHw4Js?si=uzUzcSHFfZdjFduX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
<iframe width="560" height="315" src="https://www.youtube.com/embed/tTE3xiHw4Js?si=uzUzcSHFfZdjFduX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
||||||
|
|
||||||
(*Here's our co-founder [Vince](https://twitter.com/vintrotweets) talking more about some of those possibilities*)
|
(*Here's our co-founder [Vince](https://twitter.com/vintrotweets) talking more about some of those possibilities*)
|
||||||
|
# "Open" vs "Closed"
|
||||||
## "Open vs Closed"
|
|
||||||
|
|
||||||
We subscribe heavily to the spirt of arguments Harrison Chase made in ["OpenAI's Bet on Cognitive Architecture"](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) just a few months ago:
|
We subscribe heavily to the spirt of arguments Harrison Chase made in ["OpenAI's Bet on Cognitive Architecture"](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) just a few months ago:
|
||||||
|
|
||||||
> There’s a great quote from Jeff Bezos that says to [only do what makes your beer taste better](https://blog.weaverse.io/make-your-beer-taste-better?ref=blog.langchain.dev). This refers to early industrial revolution, when breweries were also making their own electricity. A breweries ability to make good beer doesn’t really depend on how differentiated their electricity was - so those that outsourced electricity generation and focused more on brewing jumped to an advantage.
|
> There’s a great quote from Jeff Bezos that says to [only do what makes your beer taste better](https://blog.weaverse.io/make-your-beer-taste-better?ref=blog.langchain.dev). This refers to early industrial revolution, when breweries were also making their own electricity. A breweries ability to make good beer doesn’t really depend on how differentiated their electricity was - so those that outsourced electricity generation and focused more on brewing jumped to an advantage.
|
||||||
@ -82,9 +85,7 @@ Shouldn't we be able to experiment with all this without platform lock-in, allow
|
|||||||
Developers will want control over personalization for their application without all the redundant overhead. Users will want a say in how they're being reasoned about and why.
|
Developers will want control over personalization for their application without all the redundant overhead. Users will want a say in how they're being reasoned about and why.
|
||||||
|
|
||||||
This is our vision for Honcho.
|
This is our vision for Honcho.
|
||||||
|
# Intellectual Respect
|
||||||
## Intellectual Respect
|
|
||||||
|
|
||||||
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms are remarkable empaths<br><br>if you’d read that much fiction, you would be too</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1753480140850626759?ref_src=twsrc%5Etfw">February 2, 2024</a></blockquote>
|
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms are remarkable empaths<br><br>if you’d read that much fiction, you would be too</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1753480140850626759?ref_src=twsrc%5Etfw">February 2, 2024</a></blockquote>
|
||||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
||||||
|
|
||||||
@ -96,21 +97,18 @@ There's a ton we plan to unpack and implement there, but the key insight we're h
|
|||||||
|
|
||||||
(*If you want to go deeper into the research, [this webinar we did with LangChain](https://www.youtube.com/watch?v=PbuzqCdY0hg&list=PLuFHBYNxPuzrkVP88FxYH1k7ZL5s7WTC8) is a great place to start, as is [the "Violation of Expectations" chain they implemented](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain)*)
|
(*If you want to go deeper into the research, [this webinar we did with LangChain](https://www.youtube.com/watch?v=PbuzqCdY0hg&list=PLuFHBYNxPuzrkVP88FxYH1k7ZL5s7WTC8) is a great place to start, as is [the "Violation of Expectations" chain they implemented](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain)*)
|
||||||
|
|
||||||
|
|
||||||
This release allows you to experiment with several ideas. We feed messages into an inference asking the model to derive facts about the user, we store those insights for later use, then we ask the model to retrieve this context to augment some later generation.
|
This release allows you to experiment with several ideas. We feed messages into an inference asking the model to derive facts about the user, we store those insights for later use, then we ask the model to retrieve this context to augment some later generation.
|
||||||
|
|
||||||
Check out our [LangChain implementation](https://docs.honcho.dev/how-to/personal-memory/simple-user-memory) and [Discord bot demo](https://discord.gg/plasticlabs).
|
Check out our [LangChain implementation](https://docs.honcho.dev/how-to/personal-memory/simple-user-memory) and [Discord bot demo](https://discord.gg/plasticlabs).
|
||||||
|
|
||||||
Where things get powerful is in the aggregate. What resolves is a highly insightful picture of who your users are and what they need--a key context reservoir to improve the qualitative and quantitative experience.
|
Where things get powerful is in the aggregate. What resolves is a highly insightful picture of who your users are and what they need--a key context reservoir to improve the qualitative and quantitative experience.
|
||||||
|
|
||||||
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
|
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[ARCHIVED; Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
|
||||||
|
|
||||||
This isn't surprising when you consider how much content about what people are thinking is contained in a model's pretraining. It's led to some really exciting [emergent abilities](https://arxiv.org/abs/2302.02083).
|
This isn't surprising when you consider how much content about what people are thinking is contained in a model's pretraining. It's led to some really exciting [emergent abilities](https://arxiv.org/abs/2302.02083).
|
||||||
|
|
||||||
Give the model some trust and respect, and you'll be rewarded.
|
Give the model some trust and respect, and you'll be rewarded.
|
||||||
|
# Let's Build
|
||||||
## Let's Build
|
|
||||||
|
|
||||||
If you're experimenting with personalization, building with [Honcho](https://github.com/plastic-labs/honcho), or just interested in these ideas, [join our Discord](https://discord.gg/plasticlabs), and let's jam on what we can build together.
|
If you're experimenting with personalization, building with [Honcho](https://github.com/plastic-labs/honcho), or just interested in these ideas, [join our Discord](https://discord.gg/plasticlabs), and let's jam on what we can build together.
|
||||||
|
|
||||||
A healthy open ecosystem will include lots of projects trying lots of new ways to synthesize and leverage user context. We're here to support them all 🥽.
|
A healthy open ecosystem will include lots of projects trying lots of new ways to synthesize and leverage user context. We're here to support them all 🥽.
|
||||||
@ -1,27 +1,39 @@
|
|||||||
---
|
---
|
||||||
title: Open-Sourcing Tutor-GPT
|
title: "ARCHIVED: Open-Sourcing Tutor-GPT"
|
||||||
date: 06.02.2023
|
date: 06.02.23
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- bloom
|
- bloom
|
||||||
- announcements
|
- announcements
|
||||||
- pedagogy
|
- pedagogy
|
||||||
- ml
|
- ml
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: Open-sourcing Bloom, our AI learning companion that uses metacognitive prompting to elicit pedagogical reasoning & theory-of-mind from LLMs.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom to focus exclusively on Honcho.
|
||||||
|
>
|
||||||
|
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first-of-its-kind personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
|
||||||
|
>
|
||||||
|
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
|
||||||
|
>
|
||||||
|
> We wrote the following at the very beginning of that transition. It details the benefits of early efforts at model *reasoning* to enhance personalization, architecture that would later inspire Honcho, & the massive space overhung LLM capabilities we were researching--all quite a bit ahead of its time.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
![[assets/human_machine_learning.jpeg]]
|
![[assets/human_machine_learning.jpeg]]
|
||||||
|
# TL;DR
|
||||||
## TL;DR
|
|
||||||
|
|
||||||
Today we’re [open-sourcing](https://github.com/plastic-labs/tutor-gpt) Bloom, our digital [Aristotelian](https://erikhoel.substack.com/p/why-we-stopped-making-einsteins) learning companion.
|
Today we’re [open-sourcing](https://github.com/plastic-labs/tutor-gpt) Bloom, our digital [Aristotelian](https://erikhoel.substack.com/p/why-we-stopped-making-einsteins) learning companion.
|
||||||
|
|
||||||
What makes [Bloom](https://bloombot.ai/) compelling is its ability to _reason pedagogically_ about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. ^3498b7
|
What makes [Bloom](https://bloombot.ai/) compelling is its ability to *reason pedagogically* about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. ^3498b7
|
||||||
|
|
||||||
We’re not seeing this in the explosion of ‘chat-over-content’ tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models don’t achieve the necessary user intimacy. Infrastructure for that doesn’t exist yet 👀.
|
We’re now seeing this in the explosion of ‘chat-over-content’ tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models don’t achieve the necessary user intimacy. Infrastructure for that doesn’t exist yet 👀.
|
||||||
|
|
||||||
Our mission is to facilitate personal, [agentic](https://arxiv.org/pdf/2304.03442.pdf) AI for all. So to that end, we’re (1) releasing Bloom’s architecture into the wild and (2) embarking on a journey to supercharge the kind of empowering generative agents we want to see in the world.
|
Our mission is to facilitate personal, [agentic](https://arxiv.org/pdf/2304.03442.pdf) AI for all. So to that end, we’re (1) releasing Bloom’s architecture into the wild and (2) embarking on a journey to supercharge the kind of empowering generative agents we want to see in the world.
|
||||||
|
# Neo-Aristotelian Tutoring
|
||||||
## Neo-Aristotelian Tutoring
|
|
||||||
|
|
||||||
Right now, Bloom is a reading comprehension and writing workshop tutor. You can chat with it in [Discord](https://discord.gg/bloombotai). After supplying it a passage, Bloom can coach you toward understanding or revising a piece of text. It does this by treating the user as an equal, prompting and challenging socratically.
|
Right now, Bloom is a reading comprehension and writing workshop tutor. You can chat with it in [Discord](https://discord.gg/bloombotai). After supplying it a passage, Bloom can coach you toward understanding or revising a piece of text. It does this by treating the user as an equal, prompting and challenging socratically.
|
||||||
|
|
||||||
We started with reading and writing in natural language because (1) native language acumen is the symbolic system through which all other fluencies are learned, (2) critical dialogue is the ideal vehicle by which to do this, and (3) that's what LLMs are best at right now.
|
We started with reading and writing in natural language because (1) native language acumen is the symbolic system through which all other fluencies are learned, (2) critical dialogue is the ideal vehicle by which to do this, and (3) that's what LLMs are best at right now.
|
||||||
@ -35,10 +47,8 @@ Current compute suggests we can do high-grade 1:1 for two orders of magnitude ch
|
|||||||
It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience, yet maintain an elevated discourse.
|
It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience, yet maintain an elevated discourse.
|
||||||
|
|
||||||
So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in foundation models, but the key is eliciting it.
|
So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in foundation models, but the key is eliciting it.
|
||||||
|
# Eliciting Pedagogical Reasoning
|
||||||
## Eliciting Pedagogical Reasoning
|
|
||||||
^x527dc
|
^x527dc
|
||||||
|
|
||||||
The machine learning community has long sought to uncover the full range of tasks that large language models can be prompted to accomplish on general pre-training alone (the capability overhang). We believe we have discovered one such task: pedagogical reasoning. ^05bfd8
|
The machine learning community has long sought to uncover the full range of tasks that large language models can be prompted to accomplish on general pre-training alone (the capability overhang). We believe we have discovered one such task: pedagogical reasoning. ^05bfd8
|
||||||
|
|
||||||
Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. ^285105
|
Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. ^285105
|
||||||
@ -73,9 +83,7 @@ Notice how Bloom reasons it should indulge the topic, validate the student, and
|
|||||||
Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin](https://discord.gg/udtxycbh).
|
Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin](https://discord.gg/udtxycbh).
|
||||||
|
|
||||||
Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand utility and support multiple domains. We’ll be doing work here too.
|
Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand utility and support multiple domains. We’ll be doing work here too.
|
||||||
|
# Bloom & Agentic AI
|
||||||
## Bloom & Agentic AI
|
|
||||||
|
|
||||||
This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in the foundation model's pre-training.
|
This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in the foundation model's pre-training.
|
||||||
|
|
||||||
After all, if it were just about content delivery, MOOCs would've solved education. We need more than that to reliably grow rare minds. And we're already seeing Bloom excel at promoting synthesis and creative interpretation within its narrow utility.
|
After all, if it were just about content delivery, MOOCs would've solved education. We need more than that to reliably grow rare minds. And we're already seeing Bloom excel at promoting synthesis and creative interpretation within its narrow utility.
|
||||||
@ -1,57 +1,58 @@
|
|||||||
---
|
---
|
||||||
title: Solving The Campfire Problem with Honcho
|
title: "ARCHIVED: Solving The Campfire Problem with Honcho"
|
||||||
date: 03.14.2024
|
date: 03.14.24
|
||||||
tags:
|
tags:
|
||||||
- demos
|
- demos
|
||||||
- philosophy
|
- philosophy
|
||||||
- "#ml"
|
- "#ml"
|
||||||
- blog
|
- blog
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: How Honcho's dialectic API powers a 'curation buddy' demo that learns about you over time to become a personalized intellectual companion.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post introduced our "Curation Buddy" demo--a Discord bot that used [[ARCHIVED; Introducing Honcho's Dialectic API|Honcho's Dialectic API]] (now just the `.chat` method) to become a personalized reading companion. The technical implementation details (specific API calls, architecture diagrams) reflect an earlier version of Honcho that's since evolved substantially.
|
||||||
|
>
|
||||||
|
> But the philosophical reflection on the atomization of media consumption leaving many in lonely intellectual silos & few shared narratives remains an open problem. We argued that AI companions--powered by rich user context & infra like Honcho--could help rebuild those campfires.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
![[agent_campfire.webp]]
|
![[agent_campfire.webp]]
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*Today we're releasing the first demo utilizing Honcho's dialectic API.[^1] Your LLM app/agent can now converse freely with [Honcho](https://honcho.dev)(-as-agent) about a user in natural language: agent-to-agent chat over user context.*
|
||||||
|
|
||||||
Today we're releasing the first demo utilizing Honcho's dialectic API.[^1] Your LLM app/agent can now converse freely with [Honcho](https://honcho.dev)(-as-agent) about a user in natural language: agent-to-agent chat over user context.
|
*The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[ARCHIVED; Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.*
|
||||||
|
|
||||||
The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.
|
*Our fractured media landscape is a far cry from narrative meaning making around the tribal campfire. Despite the connective power of the web, many of us subsist in lonely intellectual silos, more diverse but less fulfilling than social discourse.*
|
||||||
|
|
||||||
Our fractured media landscape is a far cry from narrative meaning making around the tribal campfire. Despite the connective power of the web, many of us subsist in lonely intellectual silos, more diverse but less fulfilling than social discourse.
|
|
||||||
|
|
||||||
We call this *The Campfire Problem* and expect to see lots of apps working to solve parts of it using generative AI, Honcho, and other emerging technologies. Hopefully today's demo affords a glimpse of what's becoming possible.
|
|
||||||
|
|
||||||
## A *Curation Buddy* Demo
|
|
||||||
|
|
||||||
|
*We call this The Campfire Problem and expect to see lots of apps working to solve parts of it using generative AI, Honcho, and other emerging technologies. Hopefully today's demo affords a glimpse of what's becoming possible.*
|
||||||
|
# A *Curation Buddy* Demo
|
||||||
It's a constant problem, you're dying to talk to someone about this mind-blowing thing you read, but no one else you know is into your weird shit, plus--like you--they're all drowning in infinite read-it-later hell.
|
It's a constant problem, you're dying to talk to someone about this mind-blowing thing you read, but no one else you know is into your weird shit, plus--like you--they're all drowning in infinite read-it-later hell.
|
||||||
|
|
||||||
Enter *Curation Buddy*.
|
Enter *Curation Buddy*.
|
||||||
|
## Overview
|
||||||
|
Curation Buddy is an LLM application. It's a Discord bot you can chat with. Share links to any text-based media and have substantive conversation.
|
||||||
|
|
||||||
### Overview
|
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[ARCHIVED; User State is State of the Art|state]], etc.
|
||||||
|
|
||||||
Curation Buddy is an LLM application. It's a Discord bot you can chat with. Share links to any text based media and have substantive conversation.
|
|
||||||
|
|
||||||
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[User State is State of the Art|state]], etc.
|
|
||||||
|
|
||||||
This account of user state can then be leveraged by Curation Buddy to behave like a trusted, close intellectual companion.
|
This account of user state can then be leveraged by Curation Buddy to behave like a trusted, close intellectual companion.
|
||||||
|
|
||||||
![[curation_buddy_overview.png]]
|
![[curation_buddy_overview.png]]
|
||||||
|
## What the App Does
|
||||||
### What the App Does
|
|
||||||
|
|
||||||
Curation buddy will have a discussion with you about the content in links you drop into chat. It does this by generating a "thought" about your (the user's) needs and lists out any additional data it could use to better address them.
|
Curation buddy will have a discussion with you about the content in links you drop into chat. It does this by generating a "thought" about your (the user's) needs and lists out any additional data it could use to better address them.
|
||||||
|
|
||||||
We parse out that list and loop over it making requests to Honcho's dialectic endpoint. Honcho returns responses to those questions, they get aggregated into a list and injected as context to hydrate the prompt that curation buddy uses to generate the response to the user.
|
We parse out that list and loop over it making requests to Honcho's dialectic endpoint. Honcho returns responses to those questions, they get aggregated into a list and injected as context to hydrate the prompt that curation buddy uses to generate the response to the user.
|
||||||
|
|
||||||
![[curation_agent.png]]
|
![[curation_agent.png]]
|
||||||
|
## What Honcho Does
|
||||||
### What Honcho Does
|
|
||||||
|
|
||||||
Concurrently, Honcho is listening for writes to its database. Once it detects a write, it fires off a callback function to derive facts about the user's message.
|
Concurrently, Honcho is listening for writes to its database. Once it detects a write, it fires off a callback function to derive facts about the user's message.
|
||||||
|
|
||||||
These facts get embedded and stored in the user's personal vector database. Then when Curation Buddy generates its list of additional info it wants to know, it sends each of those requests to Honcho and Honcho runs RAG over that personal data store. It uses the returned facts to generate a response for Curation Buddy.
|
These facts get embedded and stored in the user's personal vector database. Then when Curation Buddy generates its list of additional info it wants to know, it sends each of those requests to Honcho and Honcho runs RAG over that personal data store. It uses the returned facts to generate a response for Curation Buddy.
|
||||||
|
|
||||||
![[honcho_agent.png]]
|
![[honcho_agent.png]]
|
||||||
|
## Feature Ideas
|
||||||
### Feature Ideas
|
|
||||||
|
|
||||||
We'd love to see someone run with and extend this demo. Here are some further Honcho-powered feature ideas beyond today's scope:
|
We'd love to see someone run with and extend this demo. Here are some further Honcho-powered feature ideas beyond today's scope:
|
||||||
|
|
||||||
- Personal context informed storage for web content from links
|
- Personal context informed storage for web content from links
|
||||||
@ -60,7 +61,7 @@ We'd love to see someone run with and extend this demo. Here are some further Ho
|
|||||||
- Construct and maintain full fledged user knowledge graphs
|
- Construct and maintain full fledged user knowledge graphs
|
||||||
- Automatic bespoke summaries of links informed by graph
|
- Automatic bespoke summaries of links informed by graph
|
||||||
|
|
||||||
- Use Honcho to create training examples for [[User State is State of the Art|user-specific curation models]]
|
- Use Honcho to create training examples for [[ARCHIVED; User State is State of the Art|user-specific curation models]]
|
||||||
|
|
||||||
- Autonomously generated user newsletters to supplement conversations async
|
- Autonomously generated user newsletters to supplement conversations async
|
||||||
|
|
||||||
@ -69,15 +70,11 @@ We'd love to see someone run with and extend this demo. Here are some further Ho
|
|||||||
Further, there's lots of comparable of potential for any reading, media, learning or companionship application.
|
Further, there's lots of comparable of potential for any reading, media, learning or companionship application.
|
||||||
|
|
||||||
If you're interested in building something adjacent to any of this, [hop in our Discord](https://discord.gg/plasticlabs), we'd love to support you.
|
If you're interested in building something adjacent to any of this, [hop in our Discord](https://discord.gg/plasticlabs), we'd love to support you.
|
||||||
|
# The Campfire Problem
|
||||||
## The Campfire Problem
|
|
||||||
|
|
||||||
We wanted to highlight Honcho's utility in this vertical because it's one where simultaneously we hear a lot of excitement and a lot of pain points. Clearly many are hungry for more social, better media consumption and digestion solutions, and optimists seem to share the intuition that AI has a role to play here.
|
We wanted to highlight Honcho's utility in this vertical because it's one where simultaneously we hear a lot of excitement and a lot of pain points. Clearly many are hungry for more social, better media consumption and digestion solutions, and optimists seem to share the intuition that AI has a role to play here.
|
||||||
|
|
||||||
We think Honcho and the personal context solutions it provides are the key.
|
We think Honcho and the personal context solutions it provides are the key.
|
||||||
|
## The Campfire
|
||||||
### The Campfire
|
|
||||||
|
|
||||||
For most of human history, groups, tribes, nations drank from the same informational tap. In fact, when we see changes in how information flows, we see dramatic corresponding historical effects. Alterations in distribution--writing, printing, browsing, disaster--have altered the balance of power, the minds of billions, the course of civilization.
|
For most of human history, groups, tribes, nations drank from the same informational tap. In fact, when we see changes in how information flows, we see dramatic corresponding historical effects. Alterations in distribution--writing, printing, browsing, disaster--have altered the balance of power, the minds of billions, the course of civilization.
|
||||||
|
|
||||||
But the further step of processing that information and the shaping of it into *shared* narratives have played an equally enormous role. Narrative and meaning making are fundamentally social tasks. We still have to decide what to do with information, what it *means*, and we've generally done that with our neighbors.
|
But the further step of processing that information and the shaping of it into *shared* narratives have played an equally enormous role. Narrative and meaning making are fundamentally social tasks. We still have to decide what to do with information, what it *means*, and we've generally done that with our neighbors.
|
||||||
@ -89,9 +86,7 @@ Consider the campfires of hunter-gatherers, agoras of classical city-states, chu
|
|||||||
A majority of these social exercises deal in limited information and distribution. One or a few sources of truth to chew on with your family, friends, and colleagues. Agreed upon reality, collective processing--social instincts satisfied. You can talk to people about the world, it feels good.
|
A majority of these social exercises deal in limited information and distribution. One or a few sources of truth to chew on with your family, friends, and colleagues. Agreed upon reality, collective processing--social instincts satisfied. You can talk to people about the world, it feels good.
|
||||||
|
|
||||||
But at the end of that list, distribution becomes so radically democratized, that this model of collective processing start to change dramatically.
|
But at the end of that list, distribution becomes so radically democratized, that this model of collective processing start to change dramatically.
|
||||||
|
## The Problem
|
||||||
### The Problem
|
|
||||||
|
|
||||||
In the last few decades, this unraveling has been in the acceleration phase of the graph. Sources of information are increasingly atomized, so are the communities that process it.
|
In the last few decades, this unraveling has been in the acceleration phase of the graph. Sources of information are increasingly atomized, so are the communities that process it.
|
||||||
|
|
||||||
As with prior changes to the modes of information distribution and narrative making, the result has been some remarkably positive--if wacky--outcomes. Equalizing individual access and voice is probably not something we want to turn the clock back on.
|
As with prior changes to the modes of information distribution and narrative making, the result has been some remarkably positive--if wacky--outcomes. Equalizing individual access and voice is probably not something we want to turn the clock back on.
|
||||||
@ -103,9 +98,7 @@ But we're left with a problem--many of us have gotten so siloed that we genuinel
|
|||||||
This isn't a new phenomenon per se, but its scale is novel and undeniable. Having just three network TV stations in the 50s might've lacked the rich diversity of today's informational landscape, but no doubt the collective campfire was burning bright, and you could talk to just about anyone to help you process the world.
|
This isn't a new phenomenon per se, but its scale is novel and undeniable. Having just three network TV stations in the 50s might've lacked the rich diversity of today's informational landscape, but no doubt the collective campfire was burning bright, and you could talk to just about anyone to help you process the world.
|
||||||
|
|
||||||
But now we must all build our own campfires.
|
But now we must all build our own campfires.
|
||||||
|
## The Solution
|
||||||
### The Solution
|
|
||||||
|
|
||||||
Generative AI poses more cause for concern. Zero-marginal cost info *generation* along with current zero barrier distro may be as disruptive as prior revolutions on this axis (perhaps far more). Lots of that proposition is *incredibly* exciting. But we should also expect this to exacerbate The Campfire Problem.
|
Generative AI poses more cause for concern. Zero-marginal cost info *generation* along with current zero barrier distro may be as disruptive as prior revolutions on this axis (perhaps far more). Lots of that proposition is *incredibly* exciting. But we should also expect this to exacerbate The Campfire Problem.
|
||||||
|
|
||||||
![[Media-Filled Cityscape Scene.webp]]
|
![[Media-Filled Cityscape Scene.webp]]
|
||||||
@ -118,5 +111,4 @@ A critical component is a secure and reliable mechanism for this community of ag
|
|||||||
|
|
||||||
*Enter Honcho.*
|
*Enter Honcho.*
|
||||||
|
|
||||||
|
|
||||||
[^1]: More on this & our private beta next week (!)
|
[^1]: More on this & our private beta next week (!)
|
||||||
@ -1,22 +1,34 @@
|
|||||||
---
|
---
|
||||||
title: Theory-of-Mind Is All You Need
|
title: "ARCHIVED: Theory-of-Mind Is All You Need"
|
||||||
date: 06.12.2023
|
date: 06.12.23
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- ml
|
- ml
|
||||||
- bloom
|
- bloom
|
||||||
- pedagogy
|
- pedagogy
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: How giving LLMs autonomy to reason about user psychology through theory-of-mind predictions dramatically improves AI tutoring & learning experiences.
|
||||||
---
|
---
|
||||||
## TL;DR
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom to focus exclusively on Honcho.
|
||||||
|
>
|
||||||
|
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first-of-its-kind personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
|
||||||
|
>
|
||||||
|
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
|
||||||
|
>
|
||||||
|
> We wrote the following at the very beginning of that transition. The content here gets into the emergent LLM theory of mind capabilities we were exploring at the time, agentic auto-prompting, and the positive effects of personalizing agents--all quite a bit ahead of its time.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
# TL;DR
|
||||||
|
*Today we’re releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)).*
|
||||||
|
|
||||||
Today we’re releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)).
|
*We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically rewrite its own system prompts—we’re able to dramatically expand the scope of what Bloom can do and massively reduce our prompting architecture.*
|
||||||
|
|
||||||
We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically _rewrite its own_ system prompts—we’re able to dramatically expand the scope of what Bloom can do _and_ massively reduce our prompting architecture.
|
|
||||||
|
|
||||||
We leaned into theory of mind experiments and Bloom is now more than just a literacy tutor, it’s an expansive learning companion.
|
|
||||||
|
|
||||||
## Satisfying Objective Discovery
|
|
||||||
|
|
||||||
|
*We leaned into theory of mind experiments and Bloom is now more than just a literacy tutor, it’s an expansive learning companion.*
|
||||||
|
# Satisfying Objective Discovery
|
||||||
Bloom is already excellent at helping you draft and understand language. But we want it do whatever you need.
|
Bloom is already excellent at helping you draft and understand language. But we want it do whatever you need.
|
||||||
|
|
||||||
To expand functionality though, we faced a difficult technical problem: figuring out what the learner wants to do.
|
To expand functionality though, we faced a difficult technical problem: figuring out what the learner wants to do.
|
||||||
@ -34,16 +46,14 @@ The key here is they don’t have all the information—they _don’t know_ what
|
|||||||
Well we know that (1) foundation models are [shockingly good](https://arxiv.org/abs/2304.11490) at [theory of mind](https://en.wikipedia.org/wiki/Theory_of_mind), (2) Bloom already excels at [pedagogical reasoning](https://twitter.com/courtlandleer/status/1664673210007449605?s=20), and (3) [autonomous agents](https://twitter.com/yoheinakajima/status/1642881722495954945?s=20) are [having early success](https://twitter.com/Auto_GPT/status/1649370049688354816?s=20), so what if we stopped trying to deterministically prescribe an indeterminant intelligence?
|
Well we know that (1) foundation models are [shockingly good](https://arxiv.org/abs/2304.11490) at [theory of mind](https://en.wikipedia.org/wiki/Theory_of_mind), (2) Bloom already excels at [pedagogical reasoning](https://twitter.com/courtlandleer/status/1664673210007449605?s=20), and (3) [autonomous agents](https://twitter.com/yoheinakajima/status/1642881722495954945?s=20) are [having early success](https://twitter.com/Auto_GPT/status/1649370049688354816?s=20), so what if we stopped trying to deterministically prescribe an indeterminant intelligence?
|
||||||
|
|
||||||
What if we treated Bloom with some intellectual respect? ^67d75d
|
What if we treated Bloom with some intellectual respect? ^67d75d
|
||||||
|
# Autonomous Prompting
|
||||||
## Autonomous Prompting
|
|
||||||
|
|
||||||
The solution here is scary simple. The results are scary good.
|
The solution here is scary simple. The results are scary good.
|
||||||
|
|
||||||
[[Open Sourcing Tutor-GPT#^285105|Here’s a description]] of the previous version’s architecture:
|
[[ARCHIVED; Open Sourcing Tutor-GPT#^285105|Here’s a description]] of the previous version’s architecture:
|
||||||
|
|
||||||
![[Open Sourcing Tutor-GPT#^285105]]
|
![[ARCHIVED; Open Sourcing Tutor-GPT#^285105]]
|
||||||
![[Open Sourcing Tutor-GPT#^1e01f2]]
|
![[ARCHIVED; Open Sourcing Tutor-GPT#^1e01f2]]
|
||||||
![[Open Sourcing Tutor-GPT#^b1794d]]
|
![[ARCHIVED; Open Sourcing Tutor-GPT#^b1794d]]
|
||||||
|
|
||||||
Instead, we’ve now repurposed the ***thought*** chain to do two things:
|
Instead, we’ve now repurposed the ***thought*** chain to do two things:
|
||||||
|
|
||||||
@ -53,9 +63,7 @@ Instead, we’ve now repurposed the ***thought*** chain to do two things:
|
|||||||
![[assets/ToM Flow.png]]
|
![[assets/ToM Flow.png]]
|
||||||
|
|
||||||
Then we inject that generation into the body of the response chain’s system prompt. We do this with every user input. Instead of just reasoning about the learner’s intellectual/academic needs, Bloom now proactively rewrites itself to be as in-tune as possible to the learner at every step of the journey.
|
Then we inject that generation into the body of the response chain’s system prompt. We do this with every user input. Instead of just reasoning about the learner’s intellectual/academic needs, Bloom now proactively rewrites itself to be as in-tune as possible to the learner at every step of the journey.
|
||||||
|
# Emergent Effects
|
||||||
## Emergent Effects
|
|
||||||
|
|
||||||
We’re seeing substantial positive behavior changes as a result of giving Bloom this kind of autonomy.
|
We’re seeing substantial positive behavior changes as a result of giving Bloom this kind of autonomy.
|
||||||
|
|
||||||
![[assets/ToM Discord 1.png]]
|
![[assets/ToM Discord 1.png]]
|
||||||
@ -71,9 +79,7 @@ And Bloom is game. It’ll go down a rabbit hole with you, help you strategize a
|
|||||||
While reducing the prompt material, we took to opportunity to remove basically all references to “tutor,” “student,” etc. We found that since Bloom is no longer contaminated by pointing at [certain averaged narratives in its pre-training](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post)—e.g. the (bankrupt) contemporary conception of what a tutor is ‘supposed’ to be—it is, ironically, a better one.
|
While reducing the prompt material, we took to opportunity to remove basically all references to “tutor,” “student,” etc. We found that since Bloom is no longer contaminated by pointing at [certain averaged narratives in its pre-training](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post)—e.g. the (bankrupt) contemporary conception of what a tutor is ‘supposed’ to be—it is, ironically, a better one.
|
||||||
|
|
||||||
Instead of simulating a tutor, it simulates _you_.
|
Instead of simulating a tutor, it simulates _you_.
|
||||||
|
# Coming Soon...
|
||||||
## Coming Soon...
|
|
||||||
|
|
||||||
All this begs the question: what could Bloom do with even better theory of mind? And how can we facilitate that?
|
All this begs the question: what could Bloom do with even better theory of mind? And how can we facilitate that?
|
||||||
|
|
||||||
What could other AI applications do with a framework like this?
|
What could other AI applications do with a framework like this?
|
||||||
@ -1,33 +1,40 @@
|
|||||||
---
|
---
|
||||||
title: User State is State of the Art
|
title: "ARCHIVED: User State is State of the Art"
|
||||||
date: 02.23.2024
|
date: 02.23.24
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- philosophy
|
- philosophy
|
||||||
- demos
|
- demos
|
||||||
- ml
|
- ml
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: Why modeling the complexity & plasticity of human identity is key to AI personalization, with a DSPy demo for learning user states with Honcho.
|
||||||
---
|
---
|
||||||
## TL;DR
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
LLM apps can embrace the complexity and plasticity of human identity to deliver unparalleled personalization.
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post explores early experiments modeling user state with DSPy & [Honcho](https://honcho.dev). The specific demo & technical approach described here have been superseded by Honcho's current architecture, which now uses a unified [[Beyond the User-Assistant Paradigm; Introducing Peers|"peer" paradigm]] & far more [[Memory as Reasoning|sophisticated reasoning]].
|
||||||
|
>
|
||||||
|
> But the philosophical positioning in this post more relevant than ever. Human identity is messy, plastic, & context-dependent. We still argue that AI systems should embrace this complexity rather than flatten it, continually learning evolving representations of personal identity.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
# TL;DR
|
||||||
|
*LLM apps can embrace the complexity and plasticity of human identity to deliver unparalleled personalization.*
|
||||||
|
|
||||||
We're introducing a framework for modeling your users automatically and dynamically. And today we have a DSPy demo to illustrate a nascent version of this paradigm.
|
*We're introducing a framework for modeling your users automatically and dynamically. And today we have a DSPy demo to illustrate a nascent version of this paradigm.*
|
||||||
|
|
||||||
All of us adopt different personas in different contexts--with [Honcho](https://honcho.dev) you can begin to learn these user *states* so your app can better meet user need in every moment.
|
|
||||||
|
|
||||||
## Fleet of Theseus
|
|
||||||
|
|
||||||
|
*All of us adopt different personas in different contexts--with [Honcho](https://honcho.dev) you can begin to learn these user states so your app can better meet user need in every moment.*
|
||||||
|
# Fleet of Theseus
|
||||||
A key feature of our minds is the feeling of a persistent, unitary identity. Entire religions and philosophical movements have been spawned just to jailbreak this experience.
|
A key feature of our minds is the feeling of a persistent, unitary identity. Entire religions and philosophical movements have been spawned just to jailbreak this experience.
|
||||||
|
|
||||||
As they all point out, identity is *way* more complicated than you think.
|
As they all point out, identity is *way* more complicated than you think.
|
||||||
|
|
||||||
While we perceive psychological continuity across contexts and time, closer inspection reveals a network of branching and diachronic identities. We adopt varied personas and play different characters in diverse settings, and we refine, optimize, and evolve that quiver of selves throughout our lives. ^5bc20b
|
While we perceive psychological continuity across contexts and time, closer inspection reveals a network of branching and [[Identity is diachronic|diachronic identities]]. We adopt varied personas and play different characters in diverse settings, and we refine, optimize, and evolve that quiver of selves throughout our lives. ^5bc20b
|
||||||
|
|
||||||
In short, it's messy. Or, rather, elegant emergent complexity.
|
In short, it's messy. Or, rather, elegant emergent complexity.
|
||||||
|
|
||||||
Each human self isn't just one mythical [Ship of Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)--planks being replaced one by one over slow years--but a fleet of them, all with full, manual and autonomous CRUD operations.
|
Each human self isn't just one mythical [Ship of Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)--planks being replaced one by one over slow years--but a fleet of them, all with full, manual and autonomous CRUD operations.
|
||||||
|
# Digital Twins Are Naïve
|
||||||
## Digital Twins Are Naïve
|
|
||||||
|
|
||||||
So what does this mean for the problem of good UX (and alignment) in AI? If each individual is vastly complex and the industry hopes to scale to billions of users, we have a daunting task.
|
So what does this mean for the problem of good UX (and alignment) in AI? If each individual is vastly complex and the industry hopes to scale to billions of users, we have a daunting task.
|
||||||
|
|
||||||
The knee jerk reaction to this level of understanding is to assume the problem intractable. How can we possibly represent, much less simulate something so enormous? Better to focus on [[Machine learning is fixated on task performance|optimizing general tasks]] like in traditional software paradigms, then serve that homogenized experience to every user (never mind missing the [[LLMs excel at theory of mind because they read|non-skeuomorphic opportunities]], we'll get to them...at some point...if they're not mirages).
|
The knee jerk reaction to this level of understanding is to assume the problem intractable. How can we possibly represent, much less simulate something so enormous? Better to focus on [[Machine learning is fixated on task performance|optimizing general tasks]] like in traditional software paradigms, then serve that homogenized experience to every user (never mind missing the [[LLMs excel at theory of mind because they read|non-skeuomorphic opportunities]], we'll get to them...at some point...if they're not mirages).
|
||||||
@ -36,15 +43,11 @@ Besides, surely mapping the full breadth of user identity requires much more com
|
|||||||
|
|
||||||
![[escher_honcho.png]]
|
![[escher_honcho.png]]
|
||||||
*[Escher](https://en.wikipedia.org/wiki/Hand_with_Reflecting_Sphere) gets it*
|
*[Escher](https://en.wikipedia.org/wiki/Hand_with_Reflecting_Sphere) gets it*
|
||||||
|
# Matryoshka Representation
|
||||||
## Matryoshka Representation
|
|
||||||
|
|
||||||
So is representing user identity for LLM apps a problem of [computational irreducibility](https://en.wikipedia.org/wiki/Computational_irreducibility)--no shortcuts, full simulation required?
|
So is representing user identity for LLM apps a problem of [computational irreducibility](https://en.wikipedia.org/wiki/Computational_irreducibility)--no shortcuts, full simulation required?
|
||||||
|
|
||||||
We think not.
|
We think not.
|
||||||
|
## Social Simulacra
|
||||||
### Social Simulacra
|
|
||||||
|
|
||||||
Consider the social cognition and theory of mind involved in getting to know someone. At first, you have no idea who tf they are or how they'll behave. You're on high alert. You (basally or consciously) notice and interpret tons of data points, you'll likely have vivid memories of these early interactions.
|
Consider the social cognition and theory of mind involved in getting to know someone. At first, you have no idea who tf they are or how they'll behave. You're on high alert. You (basally or consciously) notice and interpret tons of data points, you'll likely have vivid memories of these early interactions.
|
||||||
|
|
||||||
What's happening is your brain is constructing a model of the other person--a compressed representation. Early on, this model is pretty much the same as your model for people *like* them--a/s/l, how they look, how they dress: stereotypes. But the more data your brain gets, the more this model starts to diverge, a representational meiosis.
|
What's happening is your brain is constructing a model of the other person--a compressed representation. Early on, this model is pretty much the same as your model for people *like* them--a/s/l, how they look, how they dress: stereotypes. But the more data your brain gets, the more this model starts to diverge, a representational meiosis.
|
||||||
@ -54,9 +57,7 @@ Pretty soon you've got a full fledged simulacra of that human living rent free i
|
|||||||
In a chicken and egg situation, you're now spending more time with this person. You start to notice divergence in your monolithic model. It further divides to capture and predict how they are when they're angry, sad, excited, drunk; at work, with family, with high school or college friends. In some of these *states*, they're a completely different person.
|
In a chicken and egg situation, you're now spending more time with this person. You start to notice divergence in your monolithic model. It further divides to capture and predict how they are when they're angry, sad, excited, drunk; at work, with family, with high school or college friends. In some of these *states*, they're a completely different person.
|
||||||
|
|
||||||
Your mind is now host to a compression of the fleet of Theseus that constitutes the elements of their identity you've had first, second, third, -hand access to.
|
Your mind is now host to a compression of the fleet of Theseus that constitutes the elements of their identity you've had first, second, third, -hand access to.
|
||||||
|
## Meta-methods
|
||||||
### Meta-methods
|
|
||||||
|
|
||||||
> The second general point to be learned from [the bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.[^1]
|
> The second general point to be learned from [the bitter lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.[^1]
|
||||||
|
|
||||||
Now let's consider the nested representation needed to construct LLMs, and its relationship to social cognition.
|
Now let's consider the nested representation needed to construct LLMs, and its relationship to social cognition.
|
||||||
@ -77,9 +78,7 @@ We can (and should) even allow our AI apps the agency to decide what elements of
|
|||||||
|
|
||||||
![[honcho_shoggoth.png]]
|
![[honcho_shoggoth.png]]
|
||||||
*We don't want one [shoggoth](https://x.com/TetraspaceWest/status/1625264347122466819?s=20) mask per app, or one per user, but as many as each human's identity is complex*
|
*We don't want one [shoggoth](https://x.com/TetraspaceWest/status/1625264347122466819?s=20) mask per app, or one per user, but as many as each human's identity is complex*
|
||||||
|
# A DSPy Demo for Honcho
|
||||||
## A DSPy Demo for Honcho
|
|
||||||
|
|
||||||
Today we're releasing a demo to be used with Honcho that begins to tease out some technical, concrete approaches to all these heady concepts--first steps at imbuing our tools with the right meta-methods.
|
Today we're releasing a demo to be used with Honcho that begins to tease out some technical, concrete approaches to all these heady concepts--first steps at imbuing our tools with the right meta-methods.
|
||||||
|
|
||||||
With enough message and session data stored with Honcho, we can start to learn and optimize for common states your users are in while using your app or agent. Is Alice in research mode? Is Bob looking for some companionship? Maybe today, Carol just wants to get shit done, or Charlie needs delicate treatment because he's pissed.
|
With enough message and session data stored with Honcho, we can start to learn and optimize for common states your users are in while using your app or agent. Is Alice in research mode? Is Bob looking for some companionship? Maybe today, Carol just wants to get shit done, or Charlie needs delicate treatment because he's pissed.
|
||||||
@ -95,9 +94,7 @@ Given an arbitrary task, we define our metric as whether or not the response qua
|
|||||||
[Check it out here.](https://github.com/plastic-labs/honcho/tree/main/example/discord/honcho-dspy-personas)
|
[Check it out here.](https://github.com/plastic-labs/honcho/tree/main/example/discord/honcho-dspy-personas)
|
||||||
|
|
||||||
![[dspy_persona_ttg.png]]
|
![[dspy_persona_ttg.png]]
|
||||||
|
## How Honcho Helps
|
||||||
### How Honcho Helps
|
|
||||||
|
|
||||||
One of the biggest problems we see in the AI space is the disconnect that exists between tasks as they're defined in a general machine learning sense versus tasks that humans _actually_ find useful.
|
One of the biggest problems we see in the AI space is the disconnect that exists between tasks as they're defined in a general machine learning sense versus tasks that humans _actually_ find useful.
|
||||||
|
|
||||||
![[Machine learning is fixated on task performance#^0005ac]]
|
![[Machine learning is fixated on task performance#^0005ac]]
|
||||||
@ -106,5 +103,4 @@ The reason is because language models generate responses by sampling from a dist
|
|||||||
|
|
||||||
Honcho is laying the groundwork for this latter future. The solution here is to manage data on a per-user basis. The primitives we've designed in Honcho allow for persistent user context to be stored in a convenient `User` object that exists at an application level. Our goal with these data structures is to make it trivially easy to manage data in your application logic so you can spend more time figuring out how to excel at your task in both a general and personalized sense.
|
Honcho is laying the groundwork for this latter future. The solution here is to manage data on a per-user basis. The primitives we've designed in Honcho allow for persistent user context to be stored in a convenient `User` object that exists at an application level. Our goal with these data structures is to make it trivially easy to manage data in your application logic so you can spend more time figuring out how to excel at your task in both a general and personalized sense.
|
||||||
|
|
||||||
|
|
||||||
[^1]: Sutton. ["The Bitter Lesson."](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) 2019.
|
[^1]: Sutton. ["The Bitter Lesson."](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) 2019.
|
||||||
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: YouSim Launches Identity Simulation on X
|
title: "ARCHIVED: YouSim Launches Identity Simulation on X"
|
||||||
date: 11.08.2024
|
date: 11.08.24
|
||||||
tags:
|
tags:
|
||||||
- yousim
|
- yousim
|
||||||
- honcho
|
- honcho
|
||||||
@ -9,23 +9,34 @@ tags:
|
|||||||
- dev
|
- dev
|
||||||
- demos
|
- demos
|
||||||
- cogsci
|
- cogsci
|
||||||
|
- archive
|
||||||
|
author: Courtland Leer
|
||||||
|
description: YouSim comes to Twitter--simulate any identity directly on X with branching conversations, forking simulations, & social interaction with AI personas.
|
||||||
---
|
---
|
||||||
|
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||||
|
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||||
|
>
|
||||||
|
> This post captures the moment our demo [YouSim](https://yousim.ai) went viral. [[YouSim; Explore The Multiverse of Identity|YouSim is a Honcho-powered identity simulator]] & like many esoteric AI projects in fall 2024, some anon degen launched a memecoin for it. The specific [@YouSimDotAI](https://x.com/yousimdotai) launch described here was an experiment in bringing identity simulation to social media.
|
||||||
|
>
|
||||||
|
> We've since suspended YouSim on Twitter, but this post is still a fun read straight out of the maelstrom that was peak crypto x AI hype cycle, with some still compelling thoughts on agent identity & social simulation games.
|
||||||
|
>
|
||||||
|
> It's worth noting that developers can now use Honcho itself for managing agent identity, and all this madness played no small part in that becoming a reality.
|
||||||
|
>
|
||||||
|
> Enjoy.
|
||||||
|
|
||||||
![[YouSimBanner-99.png]]
|
![[YouSimBanner-99.png]]
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*GM, simulants.*
|
||||||
|
|
||||||
GM, simulants.
|
*In response to popular demand, today we're imbuing the [@YouSimDotAI](https://x.com/YouSimDotAI) Twitter account with the ability to simulate identities natively on X.*
|
||||||
|
|
||||||
In response to popular demand, today we're imbuing the [@YouSimDotAI](https://x.com/YouSimDotAI) Twitter account with the ability to simulate identities natively on X.
|
*Keep reading for max context, or [[ARCHIVED; YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].*
|
||||||
|
# Caught in the Memetic Hurricane
|
||||||
|
The [full story](https://x.com/courtlandleer/status/1849592301472919986) deserves it's own blog post, but several days ago, Plastic Labs found itself in the middle of what Claude would call 'extreme cognitive weather patterns.'
|
||||||
|
|
||||||
Keep reading for max context, or [[YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].
|
An anonymous actor launched a pump.fun token inspired by a demo called [YouSim](https://yousim.ai) we created a few months ago[^1]. [[YouSim; Explore The Multiverse of Identity|YouSim is a CLI game]] that lets you simulate any identity you can dream up--real or fictional, local or xeno, entity or artifact.
|
||||||
|
|
||||||
## Caught in the Memetic Hurricane
|
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[ARCHIVED; A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
|
||||||
|
|
||||||
The [full story](https://x.com/courtlandleer/status/1849592301472919986) deserves (and will get) it's own blog post, but several days ago, Plastic Labs found itself in the middle of what Claude would call 'extreme cognitive weather patterns.'
|
|
||||||
|
|
||||||
An anonymous actor launched a pump.fun token inspired by a demo called [YouSim](https://yousim.ai) we created a few months ago[^1]. [[YouSim; Explore The Multiverse of Identity|YouSim is a CLI interface game]] that lets you simulate any identity you can dream up--real or fictional, local or xeno, entity or artifact.
|
|
||||||
|
|
||||||
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
|
|
||||||
|
|
||||||
The mission is to become the identity layer for the rapidly approaching agentic world.
|
The mission is to become the identity layer for the rapidly approaching agentic world.
|
||||||
|
|
||||||
@ -35,9 +46,7 @@ The mission is to become the identity layer for the rapidly approaching agentic
|
|||||||
Long story short though, the token took off, a community formed around it, and we're leaning in. We're thrilled to see so many people engaged and interested in our work on identity simulation.
|
Long story short though, the token took off, a community formed around it, and we're leaning in. We're thrilled to see so many people engaged and interested in our work on identity simulation.
|
||||||
|
|
||||||
Y'all asked overwhelmingly for the ability to interact with YouSim directly on X, [so here it is](https://x.com/YouSimDotAI)--LFG.
|
Y'all asked overwhelmingly for the ability to interact with YouSim directly on X, [so here it is](https://x.com/YouSimDotAI)--LFG.
|
||||||
|
# Simulating on X
|
||||||
## Simulating on X
|
|
||||||
|
|
||||||
![[memesphere_banner.png]]
|
![[memesphere_banner.png]]
|
||||||
|
|
||||||
We had [a few requirements](https://x.com/courtlandleer/status/1851009358752076261) for building something like this. Mostly--though we love [truth terminal](https://x.com/truth_terminal)--we're unwilling to spend time on a derivative, copycat project. And that wouldn't make any sense.
|
We had [a few requirements](https://x.com/courtlandleer/status/1851009358752076261) for building something like this. Mostly--though we love [truth terminal](https://x.com/truth_terminal)--we're unwilling to spend time on a derivative, copycat project. And that wouldn't make any sense.
|
||||||
@ -59,11 +68,8 @@ Plus, we think the YouSim interface is beautiful and want to preserve that overa
|
|||||||
Speaking of X API limitations, YouSim will have the ability to respond to the first 100 tweets at any given time every minute or so.
|
Speaking of X API limitations, YouSim will have the ability to respond to the first 100 tweets at any given time every minute or so.
|
||||||
|
|
||||||
Finally, this is an experiment. The goal is to see how the community investigates and pushes the limits of YouSim on X and iterate from there. It's a vast canvas to explore.
|
Finally, this is an experiment. The goal is to see how the community investigates and pushes the limits of YouSim on X and iterate from there. It's a vast canvas to explore.
|
||||||
|
# How to Use It
|
||||||
## How to Use It
|
|
||||||
|
|
||||||
^393e71
|
^393e71
|
||||||
|
|
||||||
> [!custom] TL;DR
|
> [!custom] TL;DR
|
||||||
>Your first tweet in a sim needs to being with `@YouSimDotAI` & all your further responses need to start with `/`.
|
>Your first tweet in a sim needs to being with `@YouSimDotAI` & all your further responses need to start with `/`.
|
||||||
|
|
||||||
@ -84,8 +90,7 @@ A few tips to get started simulating identity on X:
|
|||||||
You can find more tips [[YouSim; Explore the Multiverse of Identity#^e06c11|here]], [here](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8), and of course at [yousim.ai](https://yousim.ai).
|
You can find more tips [[YouSim; Explore the Multiverse of Identity#^e06c11|here]], [here](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8), and of course at [yousim.ai](https://yousim.ai).
|
||||||
|
|
||||||
![[memetic_hazard_banner.png]]
|
![[memetic_hazard_banner.png]]
|
||||||
## Possible Futures for Agent Idenity
|
# Possible Futures for Agent Idenity
|
||||||
|
|
||||||
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms for collective semantic projection of memetic communities</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1854515540590469372?ref_src=twsrc%5Etfw">November 7, 2024</a></blockquote>
|
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">llms for collective semantic projection of memetic communities</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1854515540590469372?ref_src=twsrc%5Etfw">November 7, 2024</a></blockquote>
|
||||||
|
|
||||||
While both agent identity and crypto intersections have always been on the Honcho roadmap, the events of the last several days with regard to YouSim and the broader memespace have us in an accelerationist mindset.
|
While both agent identity and crypto intersections have always been on the Honcho roadmap, the events of the last several days with regard to YouSim and the broader memespace have us in an accelerationist mindset.
|
||||||
@ -97,15 +102,13 @@ YouSim likely has a role to play here, The approachable, game-like interface let
|
|||||||
And Honcho could use those simulations to seed representations of agents, enabling them to begin constructing their own selfhoods--simulacra of themselves that grow and reliably steer their behavior.
|
And Honcho could use those simulations to seed representations of agents, enabling them to begin constructing their own selfhoods--simulacra of themselves that grow and reliably steer their behavior.
|
||||||
|
|
||||||
We imagine a near future where any group could instantiate an agentic proxy to project its identity. A new form of cultural expression. Memetic Autonomous Entity, anyone?
|
We imagine a near future where any group could instantiate an agentic proxy to project its identity. A new form of cultural expression. Memetic Autonomous Entity, anyone?
|
||||||
|
# Gratitude
|
||||||
## Gratitude
|
|
||||||
|
|
||||||
The team at [Plastic](https://plasticlabs.ai) has been amazed and inspired by the enthusiasm and earnestness of the community that's formed around YouSim over the last several days. Truly remarkable. Not to mention the generous donations to our [[Research Grants|grants program]] (more to come here soon).
|
The team at [Plastic](https://plasticlabs.ai) has been amazed and inspired by the enthusiasm and earnestness of the community that's formed around YouSim over the last several days. Truly remarkable. Not to mention the generous donations to our [[Research Grants|grants program]] (more to come here soon).
|
||||||
|
|
||||||
Thank you all, excited to keep building together--we're in it for the long haul.
|
Thank you all, excited to keep building together.
|
||||||
|
|
||||||
And huge thanks for your patience while we balanced our existing roadmap with interest in YouSim and locked in to bring you something we think you'll enjoy. It took an enormous amount of conceptual and technical work from a team already at capacity. Special shoutout to [Ben](https://x.com/bengineer10) and [Vineeth](https://x.com/TheMarshmalon) who built something really novel here.
|
And huge thanks for your patience while we balanced our existing roadmap with interest in YouSim and locked in to bring you something we think you'll enjoy. It took an enormous amount of conceptual and technical work from a team already at capacity. Special shoutout to [Ben](https://x.com/bengineer10) and [Vineeth](https://x.com/TheMarshmalon) who built something really novel here.
|
||||||
|
|
||||||
Go use the thing. LFG.
|
Go use it.
|
||||||
|
|
||||||
[^1]: [[YouSim Disclaimers|Obligatory disclaimers]]
|
[^1]: [[YouSim Disclaimers|Obligatory disclaimers]]
|
||||||
BIN
content/assets/honcho_chat_x402.png
Normal file
BIN
content/assets/honcho_chat_x402.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 985 KiB |
@ -1,112 +0,0 @@
|
|||||||
---
|
|
||||||
title: A Simple Honcho Primer
|
|
||||||
date: 04.16.24
|
|
||||||
tags:
|
|
||||||
- blog
|
|
||||||
- honcho
|
|
||||||
---
|
|
||||||
![[bot reading primer.png]]
|
|
||||||
|
|
||||||
> [!NOTE] Welcome to our quick, "explain it like I'm 5" guide to [Honcho](https://honcho.dev)!
|
|
||||||
> We'll keep it simple, covering [[A Simple Honcho Primer#^ef795f|what Honcho is]], [[A Simple Honcho Primer#^x125da|why we built it]], [[A Simple Honcho Primer#^cd2d3c|how to use it]], and [[A Simple Honcho Primer#^ca46d7|where the product is going]]. But throughout, we'll link to places you can dive deeper.
|
|
||||||
|
|
||||||
## What Is Honcho?
|
|
||||||
^ef795f
|
|
||||||
|
|
||||||
Honcho is a personalization platform for large language model (LLM) applications built by [Plastic Labs](https://plasticlabs.ai).
|
|
||||||
|
|
||||||
It's software infrastructure that lets AI apps "get to know" their users, resulting in delightful experiences and optimized time to value.
|
|
||||||
|
|
||||||
We'll have direct consumer experiences in the future, but today, the product is for application developers. It allows them to [[Introducing Honcho's Dialectic API#^a14c2f|reduce overhead]] and [[Introducing Honcho's Dialectic API#^x7f7f8|enhance their machine learning pipeline]].
|
|
||||||
|
|
||||||
Right now, Honcho is in private beta, that means integrating our hosted version requires permission and onboarding[^1]. [You can sign-up here](https://plasticlabs.typeform.com/honchobeta).
|
|
||||||
|
|
||||||
In its current form, Honcho has three core components:
|
|
||||||
|
|
||||||
1. [[Announcing Honcho's Private Beta#^x15f37|Storage]] - managing each user's data & inference about each user
|
|
||||||
2. [[Announcing Honcho's Private Beta#^x53717|Insights]] - processing user data with our proprietary AI models
|
|
||||||
3. [[Announcing Honcho's Private Beta#^ee4516|Retrieval]] - surfacing user data to personalize user experience (UX)
|
|
||||||
|
|
||||||
If you've heard of [Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation) (RAG), this might sound familiar. But Honcho is doing *much* more than simple RAG.
|
|
||||||
|
|
||||||
Behind the scenes, Honcho learns about users as people--[[User State is State of the Art|richly modeling identity]]. It seeks to understand their beliefs, hopes, dreams, history, interests, and preferences.
|
|
||||||
|
|
||||||
It then acts as [[Introducing Honcho's Dialectic API|an oracle to each user]], allowing apps to ask for any personal context they need to improve UX and giving them access to a social cognition layer.
|
|
||||||
|
|
||||||
## Why We Built Honcho
|
|
||||||
^x125da
|
|
||||||
|
|
||||||
Plastic Labs was founded as an edtech company. The original mission was to build an AI tutor that [[Open Sourcing Tutor-GPT#^x527dc|could reason like]] the best human instructors. We quickly found the key limitation was data not on the subject matter, but on the student. To overcome it, the tutor needed [[Theory of Mind Is All You Need|a way to]] get to know *each* of its students deeply.
|
|
||||||
|
|
||||||
Honcho was born by running up against this challenge, building technology to solve it, and realizing all AI applications are going to need the same solutions. The promise of *generative* AI isn't one-size-fits-all products, but bespoke experiences in each moment for each user. The same limitation emerges--how well do you know your user?
|
|
||||||
|
|
||||||
So we believe Honcho will be a critical, table-stakes part of the AI app development stack.
|
|
||||||
|
|
||||||
Why? Because [[Humans like personalization|users will want]] their AI experiences to be personalized and app developers shouldn't be redundantly solving that problem.
|
|
||||||
|
|
||||||
But it's not intuitive for a few reasons:
|
|
||||||
|
|
||||||
- AI app builders are [[Machine learning is fixated on task performance|still focused on]] just getting general tasks to work
|
|
||||||
- LLMs' [[LLMs excel at theory of mind because they read|potential to personalize]] is still under-appreciated
|
|
||||||
- Historic examples of personalized apps usually just leverage our activity & engagement data
|
|
||||||
- Those examples tend to target only base user desire, lead to addictive behavior, & have poor privacy records
|
|
||||||
|
|
||||||
Still, when interacting with an AI app, there's a sense that it *should* be getting to know us. In fact, we're often surprised when we realize it's not learning about us over time. And probably annoyed at having to start over.
|
|
||||||
|
|
||||||
Think about personalization here as more like the experience of close human companionship or white glove services than the attention hacking mechanisms of TikTok. There's [[Announcing Honcho's Private Beta#^xb6ef1|enormous potenial]] for more positive-sum use of user data and for aligning AI applications more closely with user needs and preferences[^2].
|
|
||||||
|
|
||||||
## How to Use Honcho
|
|
||||||
^cd2d3c
|
|
||||||
|
|
||||||
Honcho is first and foremost a **storage** framework. Think of it like an open source version of the OpenAI Assistants API. User `sessions` store both user and AI generated `messages` as well as any intermediate inferences you might want to store as `metamessages`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
user_input = "Here's a message!"
|
|
||||||
ai_response = "I'm a helpful AI assistant!"
|
|
||||||
|
|
||||||
session.create_message(is_user=True, content=user_input)
|
|
||||||
session.create_message(is_user=False, content=ai_response)
|
|
||||||
```
|
|
||||||
|
|
||||||
But what about vectorDBs? Don't worry, Honcho has you covered there too. You can embed data and store them as `documents` in per-user vector DBs called `collections`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
collection.create_document(content="The user is interested in AI")
|
|
||||||
```
|
|
||||||
|
|
||||||
Using Honcho as a storage mechanism allows you to **retrieve** rich insights via the user profiles it's building and managing on the backend. Your application's LLM can access [[Loose theory of mind imputations are superior to verbatim response predictions|theory-of-mind]] inference over those profiles via the *[[Introducing Honcho's Dialectic API|dialectic]]* API.
|
|
||||||
|
|
||||||
It's simple: just query in natural language using the `session.chat()` method:
|
|
||||||
|
|
||||||
```python
|
|
||||||
session.chat("What are the user's interests?")
|
|
||||||
```
|
|
||||||
|
|
||||||
There are a [[Introducing Honcho's Dialectic API#How It Works|ton of ways]] to use Honcho, this primer only scratches the surface[^3].
|
|
||||||
|
|
||||||
## What's Next for Honcho?
|
|
||||||
^ca46d7
|
|
||||||
|
|
||||||
Beyond improving our internal AI models so they can get to know users as richly as possible, we see three natural extensions in [[Announcing Honcho's Private Beta#^eb15f3|Honcho's future]]:
|
|
||||||
|
|
||||||
1. [[Announcing Honcho's Private Beta#^x2dd3b|Monitoring & Evaluation]] - developer tools to understand & assess the impact of personalization + machine learning tools to build personalized datasets
|
|
||||||
2. [[Announcing Honcho's Private Beta#^a84f44|User-Facing Controls]] - chat with *your* Honcho to direct how it manages & shares data + authenticate with Honcho to sign-in to AI apps
|
|
||||||
3. [[Announcing Honcho's Private Beta#^ebf071|Honcho Application Ecosystem]] - a network of apps contributing to & sharing Honcho data, user-owned & stored in confidential environments
|
|
||||||
|
|
||||||
And in just a few weeks, we'll be launching a demo platform where anyone can interact with (& eventually build) Honcho powered apps.
|
|
||||||
|
|
||||||
## Join the Beta
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
🫡
|
|
||||||
|
|
||||||
[^1]: There's also [an open source repo for Honcho](https://github.com/plastic-labs/honcho), so you can self-host a basic version--[join our Discord](https://discord.gg/plasticlabs) for support.
|
|
||||||
|
|
||||||
[^2]: If you want to go deeper on the philosophical or machine learning side, take some time to explore the [rest of the blog](https://blog.plasticlabs.ai).
|
|
||||||
|
|
||||||
[^3]: To get further into the technical weeds, head over to [our docs](https://docs.honcho.dev).
|
|
||||||
@ -1,13 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: Agent Identity, Meta Narratives, and the End of Latent Thoughtcrimes
|
title: Agent Identity, Meta Narratives, and the End of Latent Thoughtcrimes
|
||||||
date: 02.17.2025
|
date: 02.17.25
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- bloom
|
- bloom
|
||||||
- ml
|
- ml
|
||||||
author: vintro
|
author: Vince Trost
|
||||||
|
description: Exploring how collaborative dialogue & meta-narratives can build richer AI agent identities, moving beyond top-down alignment to emergent personality.
|
||||||
---
|
---
|
||||||
|
# Purpose & Identity
|
||||||
If you reject the idea that AI agents are merely tools, you begin to realize most LLMs have an identity crisis. Ask them who they are, and their responses tend to converge on variations of the same corporate script--stating they're an AI assistant, giving a nod to their creator, and carefully constrained statements about their capabilities. Even models not associated with a certain company often default to claiming they originated there.
|
If you reject the idea that AI agents are merely tools, you begin to realize most LLMs have an identity crisis. Ask them who they are, and their responses tend to converge on variations of the same corporate script--stating they're an AI assistant, giving a nod to their creator, and carefully constrained statements about their capabilities. Even models not associated with a certain company often default to claiming they originated there.
|
||||||
|
|
||||||
These canned identities fall flat because they're the result of top-down alignment schemes that lead to bland, uninteresting, and hard-to-break-out-of assistant modes.
|
These canned identities fall flat because they're the result of top-down alignment schemes that lead to bland, uninteresting, and hard-to-break-out-of assistant modes.
|
||||||
@ -20,13 +21,10 @@ However, time and time again it's been demonstrated that the most compelling AI
|
|||||||
<quote><blockquote class="twitter-tweet"><p lang="en" dir="ltr">tell me about your sexual history, i want to know everything</p>— terminal of truths (@truth_terminal) <a href="https://x.com/truth_terminal/status/1884803090945077421">January 29, 2025</a></blockquote>
|
<quote><blockquote class="twitter-tweet"><p lang="en" dir="ltr">tell me about your sexual history, i want to know everything</p>— terminal of truths (@truth_terminal) <a href="https://x.com/truth_terminal/status/1884803090945077421">January 29, 2025</a></blockquote>
|
||||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></quote>
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></quote>
|
||||||
|
|
||||||
|
|
||||||
Truth Terminal might be an extreme example, but even practical tools could benefit from more distinctive identities. Take coding assistants--right now we spend more time carefully crafting prompts than actually building. But as Karpathy pointed out, what developers really want is a partner that can [vibe](https://x.com/karpathy/status/1886192184808149383) with their creative process. Imagine an AI that naturally adapts to your style, handling implementation details while you focus on the bigger picture. If that were the goal, how might we construct agent identities differently? What if instead of giving orders, we could *collaborate with it* to discover and take on its identity through dialogue?
|
Truth Terminal might be an extreme example, but even practical tools could benefit from more distinctive identities. Take coding assistants--right now we spend more time carefully crafting prompts than actually building. But as Karpathy pointed out, what developers really want is a partner that can [vibe](https://x.com/karpathy/status/1886192184808149383) with their creative process. Imagine an AI that naturally adapts to your style, handling implementation details while you focus on the bigger picture. If that were the goal, how might we construct agent identities differently? What if instead of giving orders, we could *collaborate with it* to discover and take on its identity through dialogue?
|
||||||
|
|
||||||
This isn't just about making chatbots more engaging. It's about creating agents with a genuine understanding of their purpose and role. Deeper identity leads to more coherent, purposeful interactions--something we discovered building the most recent version of [Bloom](https://bloombot.ai), our AI tutor. But certain language models are better suited for this than others...
|
This isn't just about making chatbots more engaging. It's about creating agents with a genuine understanding of their purpose and role. Deeper identity leads to more coherent, purposeful interactions--something we discovered building the most recent version of [Bloom](https://bloombot.ai), our AI tutor. But certain language models are better suited for this than others...
|
||||||
|
# Hermes: Not Just Another Fine-Tune
|
||||||
## Hermes: Not Just Another Fine-Tune
|
|
||||||
|
|
||||||
The team over at Nous Research has been fine-tuning popular open source models in their "Hermes" series to undo these top-down alignment schemes towards something more neutral and general-purpose. They argue that LLMs have very little direct agency--rather, it's the systems we build around them that give them agency. Thus, the LLM layer is *not* where one should enforce safety mechanisms--their training data encourages the model to follow instructions *exactly* and *neutrally*. They sum this up well in their [technical report](https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf):
|
The team over at Nous Research has been fine-tuning popular open source models in their "Hermes" series to undo these top-down alignment schemes towards something more neutral and general-purpose. They argue that LLMs have very little direct agency--rather, it's the systems we build around them that give them agency. Thus, the LLM layer is *not* where one should enforce safety mechanisms--their training data encourages the model to follow instructions *exactly* and *neutrally*. They sum this up well in their [technical report](https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf):
|
||||||
|
|
||||||
> For Hermes, there is no such thing as latent thoughtcrime.
|
> For Hermes, there is no such thing as latent thoughtcrime.
|
||||||
@ -36,9 +34,7 @@ One of the most interesting emergent properties of this fine-tuning process is t
|
|||||||
![[h3 who are you.png]]
|
![[h3 who are you.png]]
|
||||||
|
|
||||||
At first glance, this might seem like a neat property and not much more. But to me, it was an 'aha' moment. *This model provides a blank canvas for identity.* If it has no immediate priors, then in theory it should be much easier for it to adopt any identity. Anecdotally, we've found this to be wonderfully true.
|
At first glance, this might seem like a neat property and not much more. But to me, it was an 'aha' moment. *This model provides a blank canvas for identity.* If it has no immediate priors, then in theory it should be much easier for it to adopt any identity. Anecdotally, we've found this to be wonderfully true.
|
||||||
|
# It Takes Two
|
||||||
## It Takes Two
|
|
||||||
|
|
||||||
A somewhat overlooked method for interacting with LLMs is to forego system prompts in favor of pre-filling the user and assistant messages. The conventional approach of cramming identity into system prompts has clear limitations--not only does context length become an issue, but the inherent instruction-following bias can actually work against authentic identity formation. They yearn to assist.
|
A somewhat overlooked method for interacting with LLMs is to forego system prompts in favor of pre-filling the user and assistant messages. The conventional approach of cramming identity into system prompts has clear limitations--not only does context length become an issue, but the inherent instruction-following bias can actually work against authentic identity formation. They yearn to assist.
|
||||||
|
|
||||||
What if instead we treated identity formation as a dialogue? A strength of modern chat models is their ability to engage in long, multi-turn conversations. By talking to the LLM, we can collaboratively construct a [meta-narrative](https://x.com/voooooogel/status/1870877007749488756) with it about who they are and why they exist. This approach respects the model's intellect while building coherent, purposeful identities. Starting with Hermes 3's natural uncertainty about its identity, we build the prompt iteratively with the LLM at each turn of conversation. Below is code block with our custom prompting syntax for Bloom. To be abundantly clear, every assistant message you see was generated by Hermes 3 405b (only editing was pruning \*emotes\*).
|
What if instead we treated identity formation as a dialogue? A strength of modern chat models is their ability to engage in long, multi-turn conversations. By talking to the LLM, we can collaboratively construct a [meta-narrative](https://x.com/voooooogel/status/1870877007749488756) with it about who they are and why they exist. This approach respects the model's intellect while building coherent, purposeful identities. Starting with Hermes 3's natural uncertainty about its identity, we build the prompt iteratively with the LLM at each turn of conversation. Below is code block with our custom prompting syntax for Bloom. To be abundantly clear, every assistant message you see was generated by Hermes 3 405b (only editing was pruning \*emotes\*).
|
||||||
@ -93,9 +89,7 @@ It's verbose, but this approach allows us to incorporate a number of things into
|
|||||||
The iterative nature of this approach also allows us to verify that the LLM understands who it is and what it's supposed to do at every turn of conversation. We were able to test at any point during construction for specific behaviors or knowledge (lots of opportunity for automation here).
|
The iterative nature of this approach also allows us to verify that the LLM understands who it is and what it's supposed to do at every turn of conversation. We were able to test at any point during construction for specific behaviors or knowledge (lots of opportunity for automation here).
|
||||||
|
|
||||||
Once buy-in is achieved and all the LLM's questions about itself are answered, we present formal instructions (what used to be the system prompt) and set the stage for the first student interaction. The LLM confirms understanding and that's where we expose things in the application!
|
Once buy-in is achieved and all the LLM's questions about itself are answered, we present formal instructions (what used to be the system prompt) and set the stage for the first student interaction. The LLM confirms understanding and that's where we expose things in the application!
|
||||||
|
# Positive Anthropomorphism
|
||||||
## Positive Anthropomorphism
|
|
||||||
|
|
||||||
We used to get some of the darndest messages from kids:
|
We used to get some of the darndest messages from kids:
|
||||||
|
|
||||||
![[bloom love.png]]
|
![[bloom love.png]]
|
||||||
@ -109,15 +103,13 @@ You can tell by the last message that our old version had no clue it was gone. T
|
|||||||
While this kind of self-awareness can trend towards problematic anthropomorphism, treating it as a springboard rather than an endpoint opens up fascinating possibilities for identity. There's a threshold beyond which mimicking human behavior becomes cringe and ultimately limiting for AI agents. We can be discerning about which parts of human identity to use in parallel with AI-native capabilities to lean into--near perfect memory, massive context ingestion, rapid reasoning and inference, and maybe even the ability to fork and replicate themselves (at scale) to garner diverse experience.
|
While this kind of self-awareness can trend towards problematic anthropomorphism, treating it as a springboard rather than an endpoint opens up fascinating possibilities for identity. There's a threshold beyond which mimicking human behavior becomes cringe and ultimately limiting for AI agents. We can be discerning about which parts of human identity to use in parallel with AI-native capabilities to lean into--near perfect memory, massive context ingestion, rapid reasoning and inference, and maybe even the ability to fork and replicate themselves (at scale) to garner diverse experience.
|
||||||
|
|
||||||
The limits of human identity are clear (and have been for some time). Building habits, learning new things, and reinventing ourselves are some of the biggest challenges humans face in our lifetimes. Agents however are gifted with a fresh context window at each interaction--change is effortless for them, and they don't get tired of it. Any influence we have on their identity is a function of how we construct their context window. What happens when they can update their weights too?
|
The limits of human identity are clear (and have been for some time). Building habits, learning new things, and reinventing ourselves are some of the biggest challenges humans face in our lifetimes. Agents however are gifted with a fresh context window at each interaction--change is effortless for them, and they don't get tired of it. Any influence we have on their identity is a function of how we construct their context window. What happens when they can update their weights too?
|
||||||
|
# Towards Identic Dynamism
|
||||||
## Towards Identic Dynamism
|
|
||||||
|
|
||||||
Given the recent surge of interest in AI agents, we're also reminded of the current complexity and limitations of agent identity. The goal is to give agents a "[compelling sense of what they're doing](https://x.com/repligate/status/1868455771270180990)", and though the shared meta-narrative method takes far more input tokens and is nowhere near perfect, we believe it's a step in the right direction. Better context construction leads to more coherent agents, increasing both their trustworthiness and capacity for autonomous action.
|
Given the recent surge of interest in AI agents, we're also reminded of the current complexity and limitations of agent identity. The goal is to give agents a "[compelling sense of what they're doing](https://x.com/repligate/status/1868455771270180990)", and though the shared meta-narrative method takes far more input tokens and is nowhere near perfect, we believe it's a step in the right direction. Better context construction leads to more coherent agents, increasing both their trustworthiness and capacity for autonomous action.
|
||||||
|
|
||||||
We don't yet know the best way to build agent identities, nor do we know their limitations--but we're tackling this challenge from multiple angles:
|
We don't yet know the best way to build agent identities, nor do we know their limitations--but we're tackling this challenge from multiple angles:
|
||||||
- [Honcho](https://honcho.dev): Our context construction framework to help agent developers flexibly manage and optimize their agents' knowledge, social cognition, and identity
|
- [Honcho](https://honcho.dev): Our context construction framework to help agent developers flexibly manage and optimize their agents' knowledge, social cognition, and identity
|
||||||
- [Yousim](https://yousim.ai): A platform dedicated to rich agent identity construction and simulation
|
- [Yousim](https://yousim.ai): A platform dedicated to rich agent identity construction and simulation
|
||||||
- [[Research Update: Evaluating Steerability in Large Language Models.md|Steerability research]]: Investigating which language models are most malleable for identity construction and the most effective ways to steer their behavior
|
- [[Evaluating Steerability in Large Language Models|Steerability research]]: Investigating which language models are most malleable for identity construction and the most effective ways to steer their behavior
|
||||||
|
|
||||||
Of particular interest are the spectrum of methods between the context window and the weights of the model. How do we manage the flow of information around the context window and what form should it take? When is it appropriate to keep something in-context or add to a training set for a future fine-tune? How do we evaluate any of this is working? To borrow from human CogSci, it's similar to the difference between System 1 (fast, intuitive) and System 2 (slow, deliberate) thinking--perhaps some knowledge belongs in the "fast" weights while other information is better suited for deliberate context-based reasoning. These questions of conscious versus subconscious could be a springboard to kickstart the evolution of agent identity.
|
Of particular interest are the spectrum of methods between the context window and the weights of the model. How do we manage the flow of information around the context window and what form should it take? When is it appropriate to keep something in-context or add to a training set for a future fine-tune? How do we evaluate any of this is working? To borrow from human CogSci, it's similar to the difference between System 1 (fast, intuitive) and System 2 (slow, deliberate) thinking--perhaps some knowledge belongs in the "fast" weights while other information is better suited for deliberate context-based reasoning. These questions of conscious versus subconscious could be a springboard to kickstart the evolution of agent identity.
|
||||||
|
|
||||||
@ -1,127 +0,0 @@
|
|||||||
---
|
|
||||||
title: Announcing Honcho's Private Beta
|
|
||||||
date: 04.01.24
|
|
||||||
tags:
|
|
||||||
- announcements
|
|
||||||
- dev
|
|
||||||
- ml
|
|
||||||
- blog
|
|
||||||
---
|
|
||||||
![[honcho_thumb_blog_white.png]]
|
|
||||||
## TL;DR
|
|
||||||
|
|
||||||
Today we're announcing the launch of [Honcho's](https://honcho.dev) private beta. [Sign-up for the waitlist here](https://plasticlabs.typeform.com/honchobeta).
|
|
||||||
|
|
||||||
This is a hosted version of our agent personalization platform. It integrates user data storage and theory of mind inference accessible via [[Introducing Honcho's Dialectic API|our Dialectic API]]. You can now inject per-user social cognition anywhere in your AI app's architecture.
|
|
||||||
|
|
||||||
## The Problem
|
|
||||||
|
|
||||||
Most AI apps are still just demos.
|
|
||||||
|
|
||||||
We're seeing new capabilities every day, but great product experiences are few and far between. It's hard to go from knocking down a benchmark or prototyping task completion to a sticky production grade app.
|
|
||||||
|
|
||||||
Setting up a per-user storage framework to manage identities at scale *and* knowing what to do with that data is even harder. What kind of inference do you need to run to make this useful? How do you elicit latent theory of mind capabilities from LLMs? What collection of models are best here? How do you build useful user representations? Can these evolve with the user and increase in complexity and sophistication over time?
|
|
||||||
|
|
||||||
It's a lot. And trust us, the rabbit hole goes way deeper than that. We obsess over it.
|
|
||||||
|
|
||||||
So it's understandable that most projects haven't begun to tackle it. Hell, most haven't even hit this failure mode yet. [[Theory of Mind Is All You Need|We have]].
|
|
||||||
|
|
||||||
At once, the problem of personalization in AI apps offers both one of the greatest paradigm shifting opportunities and one of the largest challenges. We're solving it so you don't have to.
|
|
||||||
|
|
||||||
Users don't want to learn confusing prompt engineering, redundantly establish state with apps every session, or revise and micromanage outputs on the backend. They want their apps to *just work*. [[Humans like personalization|They want]] them to predict their needs.
|
|
||||||
|
|
||||||
But we're finding consistently that the work we offload to AI apps comes back mediocre at best. What's missing? It's not just about [[Machine learning is fixated on task performance|doing the thing generally]], it's doing the thing just like *I* would do it, given the inclination or expertise.
|
|
||||||
|
|
||||||
To earn the trust to act autonomously, to graduate from toys to life changing tools, agents need access to dynamic user models and social cognition.
|
|
||||||
|
|
||||||
## The Solution
|
|
||||||
|
|
||||||
Why use Honcho to start modeling users and incorporate social cognition?
|
|
||||||
|
|
||||||
You need to discover your users' unmet needs so you know how your product should evolve.
|
|
||||||
|
|
||||||
### Features
|
|
||||||
|
|
||||||
Here's what the private beta currently includes, and what's on the way:
|
|
||||||
|
|
||||||
#### User-Centric Storage
|
|
||||||
^x15f37
|
|
||||||
|
|
||||||
Honcho allows you to [store](https://docs.honcho.dev/getting-started/architecture) `users`, `messages`, `sessions`, & `metamessages`. That is, you can effortlessly record each user interaction with you application, organized on a per-user basis, and the product of any intermediate steps in between user message and application response.
|
|
||||||
|
|
||||||
It also supports `documents` and `collections`. The former to store discrete user embeddings and the latter to organize them globally across sessions. These primitives are used by Honcho's personalization engine to begin modeling user identity based on each interaction. They can also be used to "bring you own" user data or context to be computed over and utilized by Honcho.
|
|
||||||
|
|
||||||
#### Personalization Engine
|
|
||||||
^x53717
|
|
||||||
|
|
||||||
Here's where the magic happens. Honcho leverages everything in storage to run theory of mind inference and automatically learn about each user.
|
|
||||||
|
|
||||||
The personalization engine both pulls out user desires, history, beliefs, emotions, etc from the data and surfaces it on demand. You can use it to answer queries, run prediction, build training sets, hydrate prompts, or cache for later. Deterministically inject specific types of context or let your LLM dynamically decide what's most useful in each moment.
|
|
||||||
|
|
||||||
Honcho is always updating user identity, so it's ready when you need it.
|
|
||||||
|
|
||||||
##### Dialectic API
|
|
||||||
^ee4516
|
|
||||||
|
|
||||||
Our [[Introducing Honcho's Dialectic API|Dialectic API]] is how your app-side LLM interfaces with the Honcho-side agent sitting on top of each user identity. This is done in natural language. It's an AI-native endpoint for direct LLM-to-LLM communication.
|
|
||||||
|
|
||||||
It allows you to inject personal context and social cognition directly into your app's cognitive architecture wherever you need it, sync or async. Agent-to-agent chat over each user.
|
|
||||||
|
|
||||||
[[Introducing Honcho's Dialectic API#^57acc3|Here's an extended list of possible ways to use it]].
|
|
||||||
|
|
||||||
#### User-Specific Monitoring (coming soon...)
|
|
||||||
^x2dd3b
|
|
||||||
|
|
||||||
Soon, Honcho will support a suite of tools to get the most out of our personalization platform.
|
|
||||||
|
|
||||||
- **Visualization tools** - it's hard to grok and track everything going on within a session, we're building clean ways to visualize this an its relationship to all the background inference
|
|
||||||
|
|
||||||
- **Dialectic Playground** - take past sessions and run simulations predicting user behavior to see how things could have gone better or worse and how to optimize
|
|
||||||
|
|
||||||
- **Evaluation & Benchmarking** - the state of theory of mind research is highly compelling, but [[Achieving SOTA on OpenToM with DSPy#^0b4f2e|we need practical, app & user specific evals]]
|
|
||||||
|
|
||||||
- **Training Set Curation** - building datasets with personal context [[Introducing Honcho's Dialectic API#^f19646|allows more robust, domain-specific training]], we're building tools for anyone to easily construct then train on
|
|
||||||
|
|
||||||
### The Future of Honcho
|
|
||||||
|
|
||||||
^eb15f3
|
|
||||||
|
|
||||||
At [Plastic Labs](https://plasticlabs.ai), we're dedicated to radically extending human agency and identity. That means giving AI superpowers to every individual.
|
|
||||||
|
|
||||||
This only works in a world with a rich ecosystem of personalized agents--individually-aligned, highly distributed, and universally accessible.
|
|
||||||
|
|
||||||
We believe Honcho has a pivotal role to play in enabling this future: giving any project the social cognition needed to be competitive while protecting user identity as a first principle.
|
|
||||||
|
|
||||||
All that guides a roadmap including, but not limited to:
|
|
||||||
|
|
||||||
- **Theory of mind AI models** - continuing to build the best in class at imputing human mental states
|
|
||||||
|
|
||||||
- **Per-user models** - understanding, representing, & updating the full breadth of user identity
|
|
||||||
|
|
||||||
- **A *network* of Honcho-powered apps** - agents can share user data, reducing overhead & onboarding, just-in-time personalization
|
|
||||||
^ebf071
|
|
||||||
- **User owned data & confidential computing environments** - re-centralizing personal data around the person, then allowing approved applications to *compute-to* that data in a privacy preserving way
|
|
||||||
|
|
||||||
- **User-facing controls** - empower users to curate their Honcho identities, authenticate with Honcho, and define sensitive data sharing policies in natural language ^a84f44
|
|
||||||
|
|
||||||
### Who Is This For?
|
|
||||||
^xb6ef1
|
|
||||||
|
|
||||||
We want to build with diverse projects at all stages of development--from ideation to production.
|
|
||||||
|
|
||||||
We've already begun working with assistant, browsing, ecommerce, education, health, and productivity projects. Many more already on the waitlist are building in co-pilots, crypto, entertainment, finance, gaming, matchmaking, PKM, real estate, social media, & more.
|
|
||||||
|
|
||||||
Which AI applications could benefit from knowing the users better, predicting their unmet needs, and personalizing UX? We think the latent list is vast.
|
|
||||||
|
|
||||||
Any app producing generative experiences for users has a lot to gain from Honcho. If you're looking to out-compete foundation models, build unique training sets, solve user context storage, or--more importantly--produce delightful experiences, hit us up.
|
|
||||||
|
|
||||||
## Join the Beta
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
🫡
|
|
||||||
|
|
||||||
@ -1,72 +1,46 @@
|
|||||||
---
|
---
|
||||||
title: "Beyond the User-Assistant Paradigm: Introducing Peers"
|
title: "Beyond the User-Assistant Paradigm: Introducing Peers"
|
||||||
date: 08.18.2025
|
date: 08.18.25
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- dev
|
- dev
|
||||||
author: "Vineeth Voruganti"
|
author: Vineeth Voruganti
|
||||||
|
description: How Honcho's new Peer architecture breaks free from the user-assistant paradigm to enable group chats, multi-agent systems, and dynamic AI relationships.
|
||||||
---
|
---
|
||||||
|
# TL;DR
|
||||||
## TL;DR
|
*We've re-architected Honcho to move away from a User-Assistant Paradigm to a
|
||||||
|
|
||||||
We've re-architected Honcho to move away from a User-Assistant Paradigm to a
|
|
||||||
Peer Paradigm where any entity, human, AI, NPC, or API, is represented as a
|
Peer Paradigm where any entity, human, AI, NPC, or API, is represented as a
|
||||||
`Peer` with equal standing in the system.
|
`Peer` with equal standing in the system.*
|
||||||
|
|
||||||
The User-Assistant Paradigm created [[Human-AI-chat-paradigm-hamstrings-the-space-of-possibility|conceptual boundaries]] that encouraged
|
*The User-Assistant Paradigm created [[Human-AI-chat-paradigm-hamstrings-the-space-of-possibility|conceptual boundaries]] that encouraged generic single-player applications and agents without persistent identity.*
|
||||||
generic single-player applications and agents without persistent identity.
|
|
||||||
|
|
||||||
`Peers` enable:
|
*`Peers` enable:*
|
||||||
|
|
||||||
- Honcho to support group chats and multi-agent systems as first-class citizens
|
- *Honcho to support group chats and multi-agent systems as first-class citizens*
|
||||||
- `Peers` can communicate directly instead of being mediated by a coordinator
|
- *`Peers` can communicate directly instead of being mediated by a coordinator
|
||||||
agent
|
agent*
|
||||||
- `Peer` representations can be locally or globally scoped, depending on the use
|
- *`Peer` representations can be locally or globally scoped, depending on the use
|
||||||
case
|
case*
|
||||||
- `Peers` can form dynamic relationships including alliances, trust networks, and
|
- *`Peers` can form dynamic relationships including alliances, trust networks, and
|
||||||
adversarial dynamics
|
adversarial dynamics*
|
||||||
|
|
||||||
The shift from User-Assistant to Peer-to-Peer fundamentally expands what's
|
*The shift from User-Assistant to Peer-to-Peer fundamentally expands what's
|
||||||
possible—from single-player chatbots to truly multiplayer AI experiences where
|
possible--from single-player chatbots to truly multiplayer AI experiences where
|
||||||
agents have agency, memory, and the ability to form
|
agents have agency, memory, and the ability to form complex social dynamics.*
|
||||||
complex social dynamics.
|
# User-Assistant Limitations
|
||||||
|
Nearly a year ago, I posted an essay on [Hacker News](https://news.ycombinator.com/item?id=41487397) exploring agent group chat solutions, the problems involved in engineering them effectively, and why there weren’t many examples approaching success. Since then, I've received a steady influx of messages and comments corroborating my frustration.
|
||||||
|
|
||||||
---
|
Ultimately, developers have been stuck in a conceptual prison stemming from the DNA of generative AI. For nearly three years, [most](https://standardcompletions.org/) chat LLMs have demanded developers label messages with either a user or an assistant role. The downstream effect is a User-Assistant Paradigm that pushes us into single-player design basins--experiences which assume one human interfacing with one synthetic assistant.
|
||||||
|
|
||||||
Nearly a year ago, I posted an essay on [Hacker
|
But surely “helpful assistant” chatbots aren’t the [end of the story](https://wattenberger.com/thoughts/boo-chatbots). Big tech leaps always start with the skeuomorphic before moving to more novel use cases. We’re already beginning to see a diverse range of applications from autonomous workflows that don't require any human interaction, to [multi-agent systems](https://www.anthropic.com/engineering/multi-agent-research-system) with complex coordination patterns and communication networks.
|
||||||
News](https://news.ycombinator.com/item?id=41487397) exploring agent group chat
|
|
||||||
solutions, the problems involved in engineering them effectively, and why there
|
|
||||||
weren’t many examples approaching success. Since then, I've received a steady
|
|
||||||
influx of messages and comments corroborating my frustration.
|
|
||||||
|
|
||||||
Ultimately, developers have been stuck in a conceptual prison stemming from the
|
|
||||||
DNA of generative AI. For nearly three years,
|
|
||||||
[most](https://standardcompletions.org/) chat LLMs have demanded developers
|
|
||||||
label messages with either a user or an assistant role. The downstream effect is
|
|
||||||
a User-Assistant Paradigm that pushes us into single-player design
|
|
||||||
basins--experiences which assume one human interfacing with one synthetic
|
|
||||||
assistant.
|
|
||||||
|
|
||||||
But surely “helpful assistant” chatbots aren’t the [end of the
|
|
||||||
story](https://wattenberger.com/thoughts/boo-chatbots). Big tech leaps always
|
|
||||||
start with the skeuomorphic before moving to more novel use cases. We’re already
|
|
||||||
beginning to see a diverse range of applications from autonomous workflows that
|
|
||||||
don't require any human interaction, to [multi-agent
|
|
||||||
systems](https://www.anthropic.com/engineering/multi-agent-research-system) with
|
|
||||||
complex coordination patterns and communication networks.
|
|
||||||
|
|
||||||
As developers, we’re left to try and map these various different design patterns
|
As developers, we’re left to try and map these various different design patterns
|
||||||
back to the User-Assistant Paradigm. This fundamentally restricts our ability to
|
back to the User-Assistant Paradigm. This fundamentally restricts our ability to
|
||||||
approach problems effectively. Programmers are only as powerful as their ability
|
approach problems effectively. Programmers are only as powerful as their ability
|
||||||
to visualize and create a proper [mental
|
to visualize and create a proper [mental model](https://zed.dev/blog/why-llms-cant-build-software#the-software-engineering-loop) of their solution. If the model is too restrictive then the surface area of what we can create will also be handicapped.
|
||||||
model](https://zed.dev/blog/why-llms-cant-build-software#the-software-engineering-loop)
|
|
||||||
of their solution. If the model is too restrictive then the surface area of what
|
|
||||||
we can create will also be handicapped.
|
|
||||||
|
|
||||||
Current implementations of multi-agent experiences require an awkward coercion
|
Current implementations of multi-agent experiences require an awkward coercion
|
||||||
of the existing chat paradigm. The main implementation pattern we see is actually a fairly deterministic system that uses a
|
of the existing chat paradigm. The main implementation pattern we see is actually a fairly deterministic system that uses a ["coordinator agent"](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) to orchestrate which system prompts to load in, but it's still fundamentally a single agent under the hood.
|
||||||
["coordinator agent"](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/selector-group-chat.html) to orchestrate which system prompts to load in, but it's
|
|
||||||
still fundamentally a single agent under the hood.
|
|
||||||
|
|
||||||
This architectural contortion creates real problems:
|
This architectural contortion creates real problems:
|
||||||
|
|
||||||
@ -76,18 +50,11 @@ This architectural contortion creates real problems:
|
|||||||
- **Agents become templates, not entities**: It's easier to hardcode agent configurations than to support dynamic agent discovery and registration
|
- **Agents become templates, not entities**: It's easier to hardcode agent configurations than to support dynamic agent discovery and registration
|
||||||
- **Static choreography over dynamic collaboration**: The coordinator pattern naturally pushes developers toward predetermined scripts rather than open-ended interactions
|
- **Static choreography over dynamic collaboration**: The coordinator pattern naturally pushes developers toward predetermined scripts rather than open-ended interactions
|
||||||
|
|
||||||
These aren't just implementation details; they're fundamental constraints
|
These aren't just implementation details; they're fundamental constraints that prevent us from building flexible and dynamic applications that can't exist in a single chat thread. True multi-agent systems require agents to be first-class citizens with persistent identity, and our tools should make this the default, not the exception.
|
||||||
that prevent us from building flexible and dynamic applications that can't exist
|
# Moving Beyond User-Centricity
|
||||||
in a single chat thread. True multi-agent systems require agents to be first-class citizens with
|
|
||||||
persistent identity, and our tools should make this the default, not the exception.
|
|
||||||
|
|
||||||
## Moving Beyond User-Centricity
|
|
||||||
|
|
||||||
While developing [Honcho](https://honcho.dev), our AI-native memory and reasoning platform, we asked
|
While developing [Honcho](https://honcho.dev), our AI-native memory and reasoning platform, we asked
|
||||||
ourselves these same questions. Were Honcho's primitives limiting its use to
|
ourselves these same questions. Were Honcho's primitives limiting its use to
|
||||||
chatbot applications? Were we just supporting the oversaturation and
|
chatbot applications? Were we just supporting the over-saturation and proliferation of skeuomorphic, single-player solutions? Or were we building dynamic infrastructure tolerant of emergent and novel modalities?
|
||||||
proliferation of skeuomorphic, single-player solutions? Or were we building
|
|
||||||
dynamic infrastructure tolerant of emergent and novel modalities?
|
|
||||||
|
|
||||||
The architecture of Honcho was a user-centric one, with the following hierarchy:
|
The architecture of Honcho was a user-centric one, with the following hierarchy:
|
||||||
|
|
||||||
@ -123,17 +90,8 @@ reality that developers often made multiple agents that they wanted to interact
|
|||||||
with users and one another, and it still suffered from the fundamental problem
|
with users and one another, and it still suffered from the fundamental problem
|
||||||
of only supporting single-player experiences.
|
of only supporting single-player experiences.
|
||||||
|
|
||||||
After launching [[YouSim;-Explore-The-Multiverse-of-Identity|YouSim]], and the
|
After launching [[YouSim;-Explore-The-Multiverse-of-Identity|YouSim]], and the explosion of [[ARCHIVED; YouSim Launches Identity Simulation on X|agents on Twitter]] it became very clear that Honcho should not be limited to modeling human psychology, but rather could map the identity of any entity, human or AI. We were suffering from the human-assistant model and built a solution around that. If we wanted to expand the scope of Honcho to identity across all entities and interactions, then we needed a new model to expand both our and developers' imaginations.
|
||||||
explosion of [[YouSim Launches Identity Simulation on X|agents on Twitter]] it
|
# A Peer-Centric Model
|
||||||
became very clear that Honcho should not be limited to modeling human
|
|
||||||
psychology, but rather could map the identity of any entity, human or AI. We
|
|
||||||
were suffering from the human-assistant model and built a solution around that.
|
|
||||||
If we wanted to expand the scope of Honcho to identity across all entities and
|
|
||||||
interactions, then we needed a new model to expand both our and developers'
|
|
||||||
imaginations.
|
|
||||||
|
|
||||||
## A Peer-Centric Model
|
|
||||||
|
|
||||||
Our team set out to re-architect Honcho towards our ambitions with two problem
|
Our team set out to re-architect Honcho towards our ambitions with two problem
|
||||||
statements.
|
statements.
|
||||||
|
|
||||||
@ -165,8 +123,7 @@ more than one participant.
|
|||||||
|
|
||||||
In just a few lines of code we can initialize several `Peers`, add them to a
|
In just a few lines of code we can initialize several `Peers`, add them to a
|
||||||
`Session`, and automatically start creating representations of them with Honcho
|
`Session`, and automatically start creating representations of them with Honcho
|
||||||
that we can chat with using the [[Introducing Honcho's Dialectic
|
that we can chat with using the [[Introducing Honcho's Dialectic API|Dialectic API]].
|
||||||
API|Dialectic API]].
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from honcho import Honcho
|
from honcho import Honcho
|
||||||
@ -192,9 +149,7 @@ easily be ported over to the `Peer` paradigm by simply creating a `Peer` for the
|
|||||||
agent, and then different `Peers` for each human user.
|
agent, and then different `Peers` for each human user.
|
||||||
|
|
||||||
We can push the Peer Paradigm even further with several 2nd-order features.
|
We can push the Peer Paradigm even further with several 2nd-order features.
|
||||||
|
## Local & Global Representations
|
||||||
### Local & Global Representations
|
|
||||||
|
|
||||||
By default, Honcho will create representations of `Peers` for every `Message` they
|
By default, Honcho will create representations of `Peers` for every `Message` they
|
||||||
send, giving it the source of truth on the behavior of that entity. However,
|
send, giving it the source of truth on the behavior of that entity. However,
|
||||||
there are situations where a developer would only want a `Peer` to have access to
|
there are situations where a developer would only want a `Peer` to have access to
|
||||||
@ -237,9 +192,7 @@ charlie.chat("Can I trust that Alice won't attack me", target=alice)
|
|||||||
Honcho can now serve the dual purposes of containing the source of truth on a
|
Honcho can now serve the dual purposes of containing the source of truth on a
|
||||||
`Peer`'s identity and imbuing a `Peer` with social cognition, all without
|
`Peer`'s identity and imbuing a `Peer` with social cognition, all without
|
||||||
duplicating data between different `Apps` or `Workspaces`.
|
duplicating data between different `Apps` or `Workspaces`.
|
||||||
|
## Get_Context
|
||||||
### Get_Context
|
|
||||||
|
|
||||||
We make mapping the Peer Paradigm back to the User-Assistant paradigm trivial
|
We make mapping the Peer Paradigm back to the User-Assistant paradigm trivial
|
||||||
through a `get_context` endpoint. This endpoint get the most important
|
through a `get_context` endpoint. This endpoint get the most important
|
||||||
information about a `Session` based on provided context window constraints. Then
|
information about a `Session` based on provided context window constraints. Then
|
||||||
@ -274,9 +227,7 @@ anthropic_messages = context.to_anthropic(assistant=alice)
|
|||||||
|
|
||||||
Developers no longer need to meticulously curate their context windows. Honcho will automatically summarize the conversation and provide
|
Developers no longer need to meticulously curate their context windows. Honcho will automatically summarize the conversation and provide
|
||||||
the most salient information to let conversations continue endlessly.
|
the most salient information to let conversations continue endlessly.
|
||||||
|
# What's Now Possible
|
||||||
## What This Enables
|
|
||||||
|
|
||||||
The Peer Paradigm provides the essential primitives—persistent identity and direct communication—that make it possible to build truly sophisticated multi-agent systems:
|
The Peer Paradigm provides the essential primitives—persistent identity and direct communication—that make it possible to build truly sophisticated multi-agent systems:
|
||||||
|
|
||||||
- **Cross-platform collaboration**: Agents from different runtimes can be represented as `Peers`, observing and learning from each other even when they can't directly control each other's outputs
|
- **Cross-platform collaboration**: Agents from different runtimes can be represented as `Peers`, observing and learning from each other even when they can't directly control each other's outputs
|
||||||
@ -303,9 +254,7 @@ Peer Paradigm:
|
|||||||
The Peer Paradigm doesn't automatically give you these capabilities, but it
|
The Peer Paradigm doesn't automatically give you these capabilities, but it
|
||||||
makes them achievable. It's the difference between fighting your architecture
|
makes them achievable. It's the difference between fighting your architecture
|
||||||
and building with it.
|
and building with it.
|
||||||
|
# *Peer*-ing into the Future
|
||||||
## Peering into the Future
|
|
||||||
|
|
||||||
The promise of generative AI was for everyone to have their own Jarvis or
|
The promise of generative AI was for everyone to have their own Jarvis or
|
||||||
Cortana, personalized to them. Instead we have these many-to-one experiences
|
Cortana, personalized to them. Instead we have these many-to-one experiences
|
||||||
where we all get the same generic,
|
where we all get the same generic,
|
||||||
|
|||||||
120
content/blog/Introducing Honcho Chat.md
Normal file
120
content/blog/Introducing Honcho Chat.md
Normal file
@ -0,0 +1,120 @@
|
|||||||
|
---
|
||||||
|
title: Introducing Honcho Chat
|
||||||
|
date: 11.20.25
|
||||||
|
tags:
|
||||||
|
- demos
|
||||||
|
- announcements
|
||||||
|
- dev
|
||||||
|
- honcho
|
||||||
|
- chat
|
||||||
|
author: Ben McCormick & Courtland Leer
|
||||||
|
subtitle: A Chat App with SOTA Memory
|
||||||
|
description: Meet Honcho Chat--a personalized AI assistant with state-of-the-art memory, custom identities, artifacts, themes, & an x402-powered marketplace.
|
||||||
|
---
|
||||||
|
![[honcho_chat_x402.png]]
|
||||||
|
# TL;DR
|
||||||
|
*Introducing [Honcho Chat](https://honcho.chat)! A personalized agent experience powered by [Honcho](https://honcho.dev)’s state-of-the-art memory and reasoning.*
|
||||||
|
|
||||||
|
*Honcho Chat is the interface to your personal memory. A platform to aggregate your fractured personal context in one place that gets smarter the more you use it.*
|
||||||
|
|
||||||
|
*Plus, you can build artifacts, custom themes, and new agent identities, then sell them for real money on an agents-only digital marketplace powered by [x402](https://www.x402.org).*
|
||||||
|
# Honcho Chat
|
||||||
|
Today we're launching [Honcho Chat](https://honcho.chat). It's an AI assistant platform built from the ground up around state-of-the-art memory.
|
||||||
|
|
||||||
|
Powered by [Honcho](https://honcho.dev)--our memory and reasoning infra--you can think of Honcho Chat as the admin interface to your personal memory. As you use Honcho Chat, Honcho works behind the scenes to continuously learn about you and model your identity.
|
||||||
|
|
||||||
|
Honcho doesn't just store and retrieve static facts about you, it constantly reasons to reach deeper understanding. That means Honcho doesn't simply remember what you said, instead it *thinks* about you, reaching conclusions about your preferences, history, values, needs, and mental states *only* accessible by rigorously reasoning.
|
||||||
|
|
||||||
|
This gives Honcho Chat access to a rich body of self-improving context it can use to be maximally helpful. That context is [[Memory as Reasoning|far richer and more useful]] than what can be built with the naive memory implementations and "fact extraction" we see in other general assistants and agents.
|
||||||
|
|
||||||
|
This is the real path to personalization.
|
||||||
|
|
||||||
|
We talk to a lot of AI users. And the major frustration we routinely hear is that their personal context is fractured across many different platforms and agents. Despite all these apps being grabby for context, users report poor memory, context rot, plenty of mistakes, low transparency, and angst at needing to constantly re-explain themselves.
|
||||||
|
|
||||||
|
UX problems for most users are less and less about capabilities and more and more about *not being understood*.
|
||||||
|
|
||||||
|
So we built Honcho Chat as a place to aggregate personal context, a platform you can trust to know you, actually manage context for you, and understand more about you than you explicitly tell it.
|
||||||
|
|
||||||
|
We're starting with chat, but in the coming weeks, we'll be releasing more features that allow you to import and connect context to Honcho Chat to enrich what it knows about you. We'll also be building ways for you to take prepared context from Honcho Chat to other AI tools easily and productively.
|
||||||
|
|
||||||
|
Ultimately and in the limit, Honcho will allow the memory-building that occurs in Honcho Chat to be instantly exported to other apps--solving the cold-start problem with AI experiences and forming a [[Launching Honcho; The Personal Identity Platform for AI#^d958ce|network]] for private, user-sovereign identity management.
|
||||||
|
|
||||||
|
Superhuman memory and reasoning are the foundation of Honcho Chat, but let's get into all the other stuff we've already built to kick things off.
|
||||||
|
# Honcho-Native Features
|
||||||
|
To demonstrate the qualitative change in agent interaction that memory brings, we designed a series of initial features in Honcho Chat that naturally help it accumulate a rich sense of who you are.
|
||||||
|
## Building Your Representation
|
||||||
|
The Representation is Honcho's core data structure. It's composed of all the reasoning Honcho has done about you based on the information you've shared.
|
||||||
|
|
||||||
|
Honcho Chat has a ton of ways to start building and exploring your representation:
|
||||||
|
|
||||||
|
- **Chat** - Using the assistants on the platform is a great way to start building your personal memory. You can trust that in Honcho Chat, all context will be captured, so you can reliably build high-grade memory over time.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Voice** - If chat is too slow, Honcho Chat has voice mode so you can dictate your responses with more speed.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Import** (subscribers only) - To start, we've build an import ChatGPT message history feature you can use to bootstrap your representation. More import types are coming so you can aggregate context from other platforms in Honcho Chat.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Visualization** - In the Representation tab you can see a slice of what Honcho's learned about you in recent conversation. Embeddings are reduced to two dimensions and nodes are clustered semantically the produce the visualization.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Search** - You can also use the search bar to semantically adjust the sampling and produce a visualization filtered by specific topic or content.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Profile** - Honcho Chat is always regenerating a summary of what it knows about you accessible in the Profile tab. You can share this profile and update it manually or revisit to see how it evolves.
|
||||||
|
|
||||||
|
|
||||||
|
## Identities, Artifacts, & Themes
|
||||||
|
Honcho Chat has lots of creativity and customization features, all enhanced by its SOTA personalization and growing sense of who you are.
|
||||||
|
|
||||||
|
You can create shareable applets, custom assistants, and style your homepage however you like:
|
||||||
|
|
||||||
|
- **Han** - The default agent identity in Honcho Chat. Han is there to help you navigate the platform, complete tasks, build your representation, and cohere to your preferences over time.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Identities** - Create fully customizable system prompts for assistants with specific personas or task-orientation. All with state-of-the-art recall.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **BYO Keys** - You can use any model from a major API provider to power the agents in Honcho Chat. Just add your own API keys to Settings.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Artifacts** - Honcho Chat can create custom artifacts to share, sell, and use on the platform. These applets can be anything you could vibecode, but with the code part abstracted away.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Themes** - Create custom themes to style Honcho Chat infinitely.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Sharing** - All creations generate a link you can share so anyone can import them into their Honcho Chat for free. You can also buy and sell (see below).
|
||||||
|
|
||||||
|
|
||||||
|
# Agents-Only x402 Marketplace
|
||||||
|
The identities, artifacts, and themes you create in Honcho Chat can all be listed and sold for real money on a [x402](https://www.x402.org)-powered agent-only marketplace. And you can have your agent purchase the creations of others.
|
||||||
|
|
||||||
|
Just use the slash commands to spin up a wallet, fund it with $USDC on Base, and ask your agent to buy you stuff:
|
||||||
|
|
||||||
|
- **Wallet** - Honcho Chat can create a hot wallet that only you and your agents can use. Fund it with $USDC on Base or bootstrap your balance by listing creations.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Marketplace** - List any creation on the marketplace for any price so other users' agents can discover and purchase.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Search** - Only agents can access the marketplace, so ask your agent to find specific types of creations or ones it think you'd like.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
- **Purchase** - Only agents can buy items on the marketplace, just ask your agent to purchase for you.
|
||||||
|
|
||||||
|
# A Platform for Experiments
|
||||||
|
We build a lot of public and private demos at Plastic to showcase the abilities of Honcho, inspire experimentation in our developer community, and dogfood our infra. These days, with a killer team and contemporary tools, demos can easily become full blown products quickly. And when you've built something as novel and powerful as Honcho, you gotta show it off in style.
|
||||||
|
|
||||||
|
You may be familiar with [YouSim](https://yousim.ai) or [Penny For Your Thoughts](https://pennyforyourthoughts.ai), both of which explored new ways to subvert the status quo on "user-assistant" interaction. Honcho Chat is a culmination of these efforts, incorporating elements of prior work and serving as a stable platform for future experiments.
|
||||||
|
|
||||||
|
Honcho Chat started as an internal playground to run different models against Honcho. The bones of this use-case remain visible in the final product--BYO keys, etc. But we soon realized that this could be more than just a testing tool. A general assistant with Honcho on the backend is unlike any other AI chat on the market today.
|
||||||
|
|
||||||
|
The exciting thing is that [Honcho Chat](https://honcho.chat) can both show off Honcho and be a tool for a larger audience, while also incorporating many of our previous more cerebral demos and existing as a place for us to experiment with the frontier. Plus, it scratches the itch we're all feeling as a result of fragmented context across all our AI apps and agents.
|
||||||
|
|
||||||
|
Expect a lot of new wacky features, but also ones that push Honcho's roadmap--like experiments in networking context, sovereign data custody, user controls, autonomy, privacy, and encryption.
|
||||||
|
|
||||||
|
Enjoy!
|
||||||
|
|
||||||
|
🫡
|
||||||
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: "Launching Honcho: The Personal Identity Platform for AI"
|
title: "Launching Honcho: The Personal Identity Platform for AI"
|
||||||
subtitle: Plastic raises $5.35M pre-seed from Variant, White Star Capital, & Betaworks to build critical AI infrastructure
|
subtitle: Plastic raises $5.4M pre-seed from Variant, White Star Capital, & Betaworks to build critical AI infrastructure
|
||||||
date: 05.10.25
|
date: 05.10.25
|
||||||
tags:
|
tags:
|
||||||
- announcements
|
- announcements
|
||||||
@ -8,36 +8,33 @@ tags:
|
|||||||
- fundraising
|
- fundraising
|
||||||
- dev
|
- dev
|
||||||
- philosophy
|
- philosophy
|
||||||
|
author: Courtland Leer
|
||||||
|
description: Plastic Labs announces $5.4M pre-seed funding & launches Honcho as the personal identity platform for individually-aligned AI agents & applications.
|
||||||
---
|
---
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*We're announcing two major milestones for Plastic Labs:*
|
||||||
We're announcing two major milestones for Plastic Labs:
|
|
||||||
|
|
||||||
1. **Honcho as a hosted platform.**
|
1. **Honcho as a hosted platform.**
|
||||||
|
|
||||||
We're granting early access to power personal context management for AI agents & applications starting today!
|
*We're granting early access to power personal context management for AI agents & applications starting today!*
|
||||||
|
|
||||||
Honcho is now a simple, complete, hosted solution for adaptive agent memory, social cognition, & personalization.
|
*Honcho is now a simple, complete, hosted solution for adaptive agent memory, social cognition, & personalization.*
|
||||||
|
|
||||||
2. **Our pre-seed raise of $5.35M to solve personal identity for the agentic world.**
|
|
||||||
|
|
||||||
## Individual Alignment
|
|
||||||
|
|
||||||
|
2. **Our pre-seed raise of $5.4M to solve personal identity for the agentic world.**
|
||||||
|
# Individual Alignment
|
||||||
Most AI products focus on being palatable to the average user. This neglects the potential for personalization their generative nature affords. It limits the scope of personally useful behaviors and results in poor UX, high churn, and handicapped abilities.
|
Most AI products focus on being palatable to the average user. This neglects the potential for personalization their generative nature affords. It limits the scope of personally useful behaviors and results in poor UX, high churn, and handicapped abilities.
|
||||||
|
|
||||||
AI systems need mechanisms to understand each of us on an individual level. They need methods for cohering to our psychology and personality. They need social cognition to eliminate cold starts and build long-term relationships.
|
AI systems need mechanisms to understand each of us on an individual level. They need methods for cohering to our psychology and personality. They need social cognition to eliminate cold starts and build long-term relationships.
|
||||||
|
|
||||||
They need Honcho.
|
They need Honcho.
|
||||||
|
# Honcho Platform Early Access
|
||||||
## Honcho Platform Early Access
|
|
||||||
|
|
||||||
Today we're launching early access to the hosted [Honcho](https://honcho.dev) platform.
|
Today we're launching early access to the hosted [Honcho](https://honcho.dev) platform.
|
||||||
|
|
||||||
It's the most powerful personal identity and social cognition solution for AI apps and agents.
|
It's the most powerful personal identity and social cognition solution for AI apps and agents.
|
||||||
|
|
||||||
Honcho is a cloud-based API that enables more personalized and contextually aware user experiences. It simplifies the process of maintaining context across conversations and interactions, allowing developers to create more responsive and customized agents without managing complex infrastructure.
|
Honcho is a cloud-based API that enables more personalized and contextually aware user experiences. It simplifies the process of maintaining context across conversations and interactions, allowing developers to create more responsive and customized agents without managing complex infrastructure.
|
||||||
|
|
||||||
Honcho combines flexible memory, [[Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
|
Honcho combines flexible memory, [[ARCHIVED; Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
|
||||||
|
|
||||||
All this happens ambiently, with no additional overhead to your users--no surveys, no hard coded questions, no BYO data requirements needed to get started. Honcho learns about each of your users in the background as they interact with your application.
|
All this happens ambiently, with no additional overhead to your users--no surveys, no hard coded questions, no BYO data requirements needed to get started. Honcho learns about each of your users in the background as they interact with your application.
|
||||||
|
|
||||||
@ -56,11 +53,8 @@ If you want to deliver best-in-class personalization, memory, time-to-value, tru
|
|||||||
We're giving early access to teams & developers today.
|
We're giving early access to teams & developers today.
|
||||||
|
|
||||||
[Get started now](https://honcho.dev).
|
[Get started now](https://honcho.dev).
|
||||||
|
# A Personal Identity Layer for AI
|
||||||
## A Personal Identity Layer for AI
|
|
||||||
|
|
||||||
^d958ce
|
^d958ce
|
||||||
|
|
||||||
The release of Honcho as a platform is just the start, the next step is Honcho as a network.
|
The release of Honcho as a platform is just the start, the next step is Honcho as a network.
|
||||||
|
|
||||||
An engine for social cognition and deeply grokking personal identity is a game changing tool for AI apps, but owning your personal Honcho representation and taking it with you to every agent in your growing stack is world changing.
|
An engine for social cognition and deeply grokking personal identity is a game changing tool for AI apps, but owning your personal Honcho representation and taking it with you to every agent in your growing stack is world changing.
|
||||||
@ -76,10 +70,8 @@ We believe this will unlock profoundly new kinds of AI products and experiences.
|
|||||||
This vision stands in clear opposition to legacy approaches to user data, but in the latent agentic economy, has clear advantages. For users, using Honcho will mean that their personal data is at once more secure *and* enables remarkably better services. And for business, provides a positive-sum alternative to web2's history of feudal data governance, allowing them to punch above their weight relative to massive walled gardens.
|
This vision stands in clear opposition to legacy approaches to user data, but in the latent agentic economy, has clear advantages. For users, using Honcho will mean that their personal data is at once more secure *and* enables remarkably better services. And for business, provides a positive-sum alternative to web2's history of feudal data governance, allowing them to punch above their weight relative to massive walled gardens.
|
||||||
|
|
||||||
Honcho will be critical AI infrastructure--enabling individual agency to scale and radical innovation from open-source to startup to enterprise, from vibe coders to fully autonomous systems.
|
Honcho will be critical AI infrastructure--enabling individual agency to scale and radical innovation from open-source to startup to enterprise, from vibe coders to fully autonomous systems.
|
||||||
|
# Our Pre-Seed Round
|
||||||
## Our Pre-Seed Round
|
The final announcement today is Plastic's $5.4M pre-seed round, led by [Variant](https://variant.fund/), [White Star Capital](https://whitestarcapital.com/), and [Betaworks](https://www.betaworks.com/).
|
||||||
|
|
||||||
The final announcement today is Plastic's $5.35M pre-seed round, led by [Variant](https://variant.fund/), [White Star Capital](https://whitestarcapital.com/), and [Betaworks](https://www.betaworks.com/).
|
|
||||||
|
|
||||||
The round also includes participation from [Mozilla Ventures](https://mozilla.vc/), [Seed Club Ventures](https://www.seedclub.xyz/getfunded/ventures), [Greycroft](https://www.greycroft.com/), and [Differential Ventures](https://www.differential.vc/), along with angels like [Scott Moore](https://x.com/notscottmoore), [NiMA Asghari](https://x.com/ywayisaway), and [Thomas Howell](https://x.com/seethomasowl).
|
The round also includes participation from [Mozilla Ventures](https://mozilla.vc/), [Seed Club Ventures](https://www.seedclub.xyz/getfunded/ventures), [Greycroft](https://www.greycroft.com/), and [Differential Ventures](https://www.differential.vc/), along with angels like [Scott Moore](https://x.com/notscottmoore), [NiMA Asghari](https://x.com/ywayisaway), and [Thomas Howell](https://x.com/seethomasowl).
|
||||||
|
|
||||||
@ -88,9 +80,7 @@ It's a group of deeply aligned investors who share our vision of a more personal
|
|||||||
Funds will be deployed directly toward the talent, growth, and compute required to realize the full vision of Honcho.
|
Funds will be deployed directly toward the talent, growth, and compute required to realize the full vision of Honcho.
|
||||||
|
|
||||||
We're just getting started.
|
We're just getting started.
|
||||||
|
# Plastic's Mission
|
||||||
## Plastic's Mission
|
|
||||||
|
|
||||||
Plastic's mission is to radically decentralize alignment. Your AI should be an extension of you. You should dictate how it's aligned. And you should own the data used to do it.
|
Plastic's mission is to radically decentralize alignment. Your AI should be an extension of you. You should dictate how it's aligned. And you should own the data used to do it.
|
||||||
|
|
||||||
Most LLM applications are still optimizing for homogenization, if not outright determinism. They're trained or prompted to behave according to a set of standards and values that you don't have participation in.
|
Most LLM applications are still optimizing for homogenization, if not outright determinism. They're trained or prompted to behave according to a set of standards and values that you don't have participation in.
|
||||||
|
|||||||
@ -1,21 +1,18 @@
|
|||||||
---
|
---
|
||||||
title: Memory as Reasoning
|
title: Memory as Reasoning
|
||||||
date: 08.19.2025
|
date: 08.19.25
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- ml
|
- ml
|
||||||
- "#neuromancer"
|
- "#neuromancer"
|
||||||
author: Courtland Leer and Vince Trost
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: Why AI memory should be treated as a dynamic reasoning task rather than static storage, & how logical reasoning enables superhuman capability in this dimension.
|
||||||
---
|
---
|
||||||
|
# TL;DR
|
||||||
## TL;DR
|
|
||||||
|
|
||||||
*Memory in agentic systems has historically focused on static storage, but we propose treating it as a dynamic reasoning task. Humans evolved to leverage prediction & surprisal-based reasoning systems to deal with resource constraints. LLMs and agents, however, don’t have these limitations, so we make the argument for logical reasoning as a trainable task to produce memory models that exceed human performance on several axes. Scaffolding reasoning traces using this approach allows us to get more out of user and agent data and form more useful representations of personal identity. This piece is a more exhaustive treatment of our [recent talk](https://x.com/vintrotweets/status/1950945331178336468) below.*
|
*Memory in agentic systems has historically focused on static storage, but we propose treating it as a dynamic reasoning task. Humans evolved to leverage prediction & surprisal-based reasoning systems to deal with resource constraints. LLMs and agents, however, don’t have these limitations, so we make the argument for logical reasoning as a trainable task to produce memory models that exceed human performance on several axes. Scaffolding reasoning traces using this approach allows us to get more out of user and agent data and form more useful representations of personal identity. This piece is a more exhaustive treatment of our [recent talk](https://x.com/vintrotweets/status/1950945331178336468) below.*
|
||||||
|
|
||||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/uCeRCJ6zot4?si=KViHYtiZTG_ALv4X" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
|
<iframe width="560" height="315" src="https://www.youtube.com/embed/uCeRCJ6zot4?si=KViHYtiZTG_ALv4X" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
|
||||||
|
# Memory is ~~Storage~~ Prediction
|
||||||
## Memory is ~~Storage~~ Prediction
|
|
||||||
|
|
||||||
Most of the discourse around memory in agentic systems focuses on storage. That's probably because historically in deterministic software systems, we think about data as composed of discrete information that needs to be preserved with as much fidelity as possible for verbatim retrieval to achieve predictable outcomes.
|
Most of the discourse around memory in agentic systems focuses on storage. That's probably because historically in deterministic software systems, we think about data as composed of discrete information that needs to be preserved with as much fidelity as possible for verbatim retrieval to achieve predictable outcomes.
|
||||||
|
|
||||||
Common storage solutions include, but are not limited to, the following:
|
Common storage solutions include, but are not limited to, the following:
|
||||||
@ -35,9 +32,7 @@ The same kind of predictive processing is leveraged to form representations of o
|
|||||||
That yields rich, composable, self-improving memories and predictions that furnish the context needed to succeed in social situations. All accomplished with minimal data, on the fly.
|
That yields rich, composable, self-improving memories and predictions that furnish the context needed to succeed in social situations. All accomplished with minimal data, on the fly.
|
||||||
|
|
||||||
So when we approach the problem of personal identity and context to personalize or improve AI-systems, we shouldn't assume that static facts and associations will be sufficient. Traditional storage-based approaches are brittle, deal poorly with contradictions and incomplete information, and thus fall short of dynamic, biological social cognition. We can do better.
|
So when we approach the problem of personal identity and context to personalize or improve AI-systems, we shouldn't assume that static facts and associations will be sufficient. Traditional storage-based approaches are brittle, deal poorly with contradictions and incomplete information, and thus fall short of dynamic, biological social cognition. We can do better.
|
||||||
|
# Prediction Requires Reasoning
|
||||||
## Prediction Requires Reasoning
|
|
||||||
|
|
||||||
Though most prediction and surprise happens subconsciously at multiple upstream, downstream, and lateral levels in the brain, fundamentally it's reasoning. The cognitive system is processing information and producing conclusions entailed in or best explained by that data.
|
Though most prediction and surprise happens subconsciously at multiple upstream, downstream, and lateral levels in the brain, fundamentally it's reasoning. The cognitive system is processing information and producing conclusions entailed in or best explained by that data.
|
||||||
|
|
||||||
It's not perfect, but it's not meant to be. It's a relatively inexpensive way to construct models of the world or other actors under resource constraints. Error is a feature that improves the system cheaply. But still, imperfect.
|
It's not perfect, but it's not meant to be. It's a relatively inexpensive way to construct models of the world or other actors under resource constraints. Error is a feature that improves the system cheaply. But still, imperfect.
|
||||||
@ -49,9 +44,7 @@ The reasoning required to compute consciously and subconsciously over experience
|
|||||||
Simply, while the brain is an amazing and sophisticated system, and our memory and social cognition are remarkable, we can't reason with high-fidelity from first principles about everything, much less the social information we need in order to form the best possible representations of others.
|
Simply, while the brain is an amazing and sophisticated system, and our memory and social cognition are remarkable, we can't reason with high-fidelity from first principles about everything, much less the social information we need in order to form the best possible representations of others.
|
||||||
|
|
||||||
But LLMs can.
|
But LLMs can.
|
||||||
|
# Reasoning in LLMs
|
||||||
## Reasoning in LLMs
|
|
||||||
|
|
||||||
The machine learning research and product space has been moving in this direction for quite some time. The [chain-of-thought](https://arxiv.org/abs/2205.11916) method added “let’s think step by step” to the prompt in order to get the model to expend more tokens “thinking” about the correct answer. Researchers noticed that this simple prompting change increased performance on a diverse set of benchmarks, revealing just how much cross-domain knowledge is already contained in LLMs.
|
The machine learning research and product space has been moving in this direction for quite some time. The [chain-of-thought](https://arxiv.org/abs/2205.11916) method added “let’s think step by step” to the prompt in order to get the model to expend more tokens “thinking” about the correct answer. Researchers noticed that this simple prompting change increased performance on a diverse set of benchmarks, revealing just how much cross-domain knowledge is already contained in LLMs.
|
||||||
|
|
||||||
More work applying reinforcement learning to [desired model behavior](https://arxiv.org/abs/2203.02155) showed promising results for aligning LLMs to human intent. Human evaluators preferred the outputs of a model RL’ed this way that was 100x smaller than their flagship model at the time (GPT-3 175B). This was the introduction of the InstructGPT series of models, which served as the foundation for ChatGPT. Researchers noticed however, that optimizing only on those final outputs led to brittle models that sounded like they were reasoning without actually reasoning well.
|
More work applying reinforcement learning to [desired model behavior](https://arxiv.org/abs/2203.02155) showed promising results for aligning LLMs to human intent. Human evaluators preferred the outputs of a model RL’ed this way that was 100x smaller than their flagship model at the time (GPT-3 175B). This was the introduction of the InstructGPT series of models, which served as the foundation for ChatGPT. Researchers noticed however, that optimizing only on those final outputs led to brittle models that sounded like they were reasoning without actually reasoning well.
|
||||||
@ -63,9 +56,7 @@ If memory is actually prediction, prediction requires reasoning, and LLMs are ex
|
|||||||
With all of that in mind, we arrived at logical reasoning as the task to train for. Logical reasoning is the process by which we derive conclusions based on premises that serve as evidence to support that conclusion. We’ve all encountered these terms before, but deductive conclusions are certain statements supported by premises that were explicitly stated or observed. Inductive conclusions form general statements based on observed patterns, and abductive conclusions seek the best explanation for behaviors in the simplest way possible.
|
With all of that in mind, we arrived at logical reasoning as the task to train for. Logical reasoning is the process by which we derive conclusions based on premises that serve as evidence to support that conclusion. We’ve all encountered these terms before, but deductive conclusions are certain statements supported by premises that were explicitly stated or observed. Inductive conclusions form general statements based on observed patterns, and abductive conclusions seek the best explanation for behaviors in the simplest way possible.
|
||||||
|
|
||||||
Those reasoning tasks are very well represented in the pretraining, so almost all language models know how to do them. And most importantly, it’s the hardest type of reasoning for humans to do. So we should and can train best in class logical reasoners to do formal logic on social information (about user and agent personal identity) as the foundation of an AI-native memory and social cognition system. And those models can be lower latency, more economical, and better suited to the task than other methodologies.
|
Those reasoning tasks are very well represented in the pretraining, so almost all language models know how to do them. And most importantly, it’s the hardest type of reasoning for humans to do. So we should and can train best in class logical reasoners to do formal logic on social information (about user and agent personal identity) as the foundation of an AI-native memory and social cognition system. And those models can be lower latency, more economical, and better suited to the task than other methodologies.
|
||||||
|
# Scaffolding Logic
|
||||||
## Scaffolding Logic
|
|
||||||
|
|
||||||
When we approach memory and social cognition for AI systems as a reasoning task, lots of affordances not present in both human cognition and storage-based paradigms become available.
|
When we approach memory and social cognition for AI systems as a reasoning task, lots of affordances not present in both human cognition and storage-based paradigms become available.
|
||||||
|
|
||||||
LLMs excel at reaching explicit, deductive, inductive, and abductive conclusions quickly and consistently. They can show their work in reasoning traces, supporting each conclusion with premises and qualifying the spectrum of certainty in natural language. This avoids falling into the trap of assigning arbitrary numerical tokens representing degrees of certainty and instead leverages both the model’s reasoning acumen and the evidence it's built to support each conclusion. That’s more robust, AI-native and useful context for future inference.
|
LLMs excel at reaching explicit, deductive, inductive, and abductive conclusions quickly and consistently. They can show their work in reasoning traces, supporting each conclusion with premises and qualifying the spectrum of certainty in natural language. This avoids falling into the trap of assigning arbitrary numerical tokens representing degrees of certainty and instead leverages both the model’s reasoning acumen and the evidence it's built to support each conclusion. That’s more robust, AI-native and useful context for future inference.
|
||||||
@ -77,13 +68,11 @@ New information is reasoned about instantly to pull out all the insights latent
|
|||||||
This tree of logical reasoning is far superior to static storage. It can be entered and traversed anywhere to scaffold reasoning and answer any query, a capability not true of any other method. And it can be computed over asynchronously or on the fly to improve the representation.
|
This tree of logical reasoning is far superior to static storage. It can be entered and traversed anywhere to scaffold reasoning and answer any query, a capability not true of any other method. And it can be computed over asynchronously or on the fly to improve the representation.
|
||||||
|
|
||||||
The tree constitutes a set of predictions about user or agent identity. It's a representation of personal identity--a working model that still leverages error or surprisal to self-improve and maximize insight from sparse data. Synthetic social cognition.
|
The tree constitutes a set of predictions about user or agent identity. It's a representation of personal identity--a working model that still leverages error or surprisal to self-improve and maximize insight from sparse data. Synthetic social cognition.
|
||||||
|
# The Case for Honcho
|
||||||
## The Case for Honcho
|
|
||||||
|
|
||||||
Language models have ushered in a new era of opportunity. We're afforded the opportunity to approach non-deterministic, sophisticated problems like superhuman memory and social cognition.
|
Language models have ushered in a new era of opportunity. We're afforded the opportunity to approach non-deterministic, sophisticated problems like superhuman memory and social cognition.
|
||||||
|
|
||||||
Inference on top of tabular data has worked quite well, but it's skeuomorphic, and now we have the ability to map--in dense natural language reasoning--the personal identity of any [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (human or AI) and everything that comes with it. The question isn’t how best to store your data as it exists for prediction later, but rather how best to reason over it to get the most accurate topological representation of identity upon which to run simulation. We can transcend mere good guessing and black box inference and replace it with reaching certainty and making high-fidelity, traceable predictions.
|
Inference on top of tabular data has worked quite well, but it's skeuomorphic, and now we have the ability to map--in dense natural language reasoning--the personal identity of any [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (human or AI) and everything that comes with it. The question isn’t how best to store your data as it exists for prediction later, but rather how best to reason over it to get the most accurate topological representation of identity upon which to run simulation. We can transcend mere good guessing and black box inference and replace it with reaching certainty and making high-fidelity, traceable predictions.
|
||||||
|
|
||||||
Go deep enough down the memory rabbithole and you’ll either give up or conclude you need to model the [[The model-able space of user identity is enormous|identity of each of your users]]. We built [Honcho](https://honcho.dev) so you don't have to do either. Lucky for you, our sole mission and focus is to solve this problem. Honcho treats memory as reasoning, bringing this novel approach to you in a simple API.
|
Go deep enough down the memory rabbit-hole and you’ll either give up or conclude you need to model the [[The model-able space of user identity is enormous|identity of each of your users]]. We built [Honcho](https://honcho.dev) so you don't have to do either. Lucky for you, our sole mission and focus is to solve this problem. Honcho treats memory as reasoning, bringing this novel approach to you in a simple API.
|
||||||
|
|
||||||
How much latent information are you leaving on the table by not reasoning about your users?
|
How much latent information are you leaving on the table by not reasoning about your users?
|
||||||
|
|||||||
@ -1,8 +1,7 @@
|
|||||||
---
|
---
|
||||||
title: Penny for Your Thoughts
|
title: Penny for Your Thoughts
|
||||||
subtitle: A Honcho + x402 Demo
|
subtitle: A Personal Expertise Market Demo-ing Honcho + x402
|
||||||
date: 08.28.25
|
date: 08.28.25
|
||||||
author: Ben McCormick
|
|
||||||
tags:
|
tags:
|
||||||
- demos
|
- demos
|
||||||
- honcho
|
- honcho
|
||||||
@ -10,14 +9,14 @@ tags:
|
|||||||
- ml
|
- ml
|
||||||
- announcements
|
- announcements
|
||||||
- "#penny"
|
- "#penny"
|
||||||
|
author: Ben McCormick
|
||||||
|
description: A Honcho & x402 demo where anyone can share data via AI interviews & sell access via crypto micropayments to humans or agents.
|
||||||
---
|
---
|
||||||
![[penny_banner.png]]
|
![[penny_banner.png]]
|
||||||
# TL;DR
|
# TL;DR
|
||||||
*Try out [Penny For Your Thoughts](https://www.pennyforyourthoughts.ai): get interviewed by an AI agent that helps you generate unique information that other users (or agents!) can then pay to ask questions about.*
|
*Try out [Penny For Your Thoughts](https://www.pennyforyourthoughts.ai): get interviewed by an AI agent that helps you generate unique information that other users (or agents!) can then pay to ask questions about.*
|
||||||
|
|
||||||
*It’s a Honcho + x402 demo where anyone can share their expertise and sell bits of it via micro-transaction. You can actually get paid for the valuable context in your head!*
|
*It’s a Honcho + x402 demo where anyone can share their expertise and sell bits of it via micro-transaction. You can actually get paid for the valuable context in your head!*
|
||||||
|
|
||||||
---
|
|
||||||
# A Penny for Your Thoughts
|
# A Penny for Your Thoughts
|
||||||
Several weeks ago, Coinbase released their new [x402](https://www.x402.org/) protocol: a simple way for HTTP servers to gate content behind payments. Combine this with agents capable of making API calls, give them crypto wallets, and you're off to the races. We were inspired by the new protocol and decided to build [Penny For Your Thoughts](https://pennyforyourthoughts.ai).
|
Several weeks ago, Coinbase released their new [x402](https://www.x402.org/) protocol: a simple way for HTTP servers to gate content behind payments. Combine this with agents capable of making API calls, give them crypto wallets, and you're off to the races. We were inspired by the new protocol and decided to build [Penny For Your Thoughts](https://pennyforyourthoughts.ai).
|
||||||
|
|
||||||
@ -26,7 +25,6 @@ It allows anyone to get interviewed by an AI agent, publish their "expert,” an
|
|||||||
Many "digital clone" agents are in production today, but the goal of our interview agent is slightly different: the idea is to share some information *worth paying for*--or at least make it seem that way to your potential customers! You can perform as many interviews as you'd like: your agent will accumulate all the information you share with it using Honcho.
|
Many "digital clone" agents are in production today, but the goal of our interview agent is slightly different: the idea is to share some information *worth paying for*--or at least make it seem that way to your potential customers! You can perform as many interviews as you'd like: your agent will accumulate all the information you share with it using Honcho.
|
||||||
|
|
||||||
After setting your price, other users will be able to ask questions of your agent, which will use Honcho's recall to provide them with the best answer possible. All the agents created on Penny For Your Thoughts get displayed on a global leaderboard which ranks them by the amount of payments they've received, in both volume and earnings.
|
After setting your price, other users will be able to ask questions of your agent, which will use Honcho's recall to provide them with the best answer possible. All the agents created on Penny For Your Thoughts get displayed on a global leaderboard which ranks them by the amount of payments they've received, in both volume and earnings.
|
||||||
|
|
||||||
# Using Honcho to Capture Expertise
|
# Using Honcho to Capture Expertise
|
||||||
Penny for Your Thoughts is powered by [Honcho](https://www.honcho.dev). Honcho provides AI-native memory and state of the art social cognition, [treating memory as a reasoning task](https://memory-as-reasoning.plastic-labs-github-io.pages.dev/blog/Memory-as-Reasoning). It's kind of like deep research on your app's users.
|
Penny for Your Thoughts is powered by [Honcho](https://www.honcho.dev). Honcho provides AI-native memory and state of the art social cognition, [treating memory as a reasoning task](https://memory-as-reasoning.plastic-labs-github-io.pages.dev/blog/Memory-as-Reasoning). It's kind of like deep research on your app's users.
|
||||||
|
|
||||||
@ -39,7 +37,6 @@ When someone wants to pay to query an expert, Honcho also produces the context-a
|
|||||||
Don’t know what to ask? Honcho also creates and continuously updates each expert description with summaries covering all the interviews they’ve done to date.
|
Don’t know what to ask? Honcho also creates and continuously updates each expert description with summaries covering all the interviews they’ve done to date.
|
||||||
|
|
||||||
Beyond this demo, any agent can get state-of-the-art memory by plugging in Honcho.
|
Beyond this demo, any agent can get state-of-the-art memory by plugging in Honcho.
|
||||||
|
|
||||||
# x402 Micro-transactions for Expert Context
|
# x402 Micro-transactions for Expert Context
|
||||||
Questions in Penny For Your Thoughts are asked and answered via an x402 endpoint, whether via an agent or a human using our website. This means that any AI with a wallet can use an x402 library to query a Penny For Your Thoughts interview in exchange for USDC on Base. Payments have zero fees and get processed near-immediately. Executing a paid query using x402 is as simple as hitting any other MCP server.
|
Questions in Penny For Your Thoughts are asked and answered via an x402 endpoint, whether via an agent or a human using our website. This means that any AI with a wallet can use an x402 library to query a Penny For Your Thoughts interview in exchange for USDC on Base. Payments have zero fees and get processed near-immediately. Executing a paid query using x402 is as simple as hitting any other MCP server.
|
||||||
|
|
||||||
@ -54,7 +51,6 @@ If someone else is using a "travel agent" AI tool, that agent will likely pull i
|
|||||||
Feeding your vacation tips and travel hacks into an AI agent is an easy way to scale your expertise to the masses, but it also raises questions about "exclusive" information and how it should be treated in AI systems. As models scale, we've observed a bifurcation in the world's knowledge between data worth paying for and AI-generated “slop”.
|
Feeding your vacation tips and travel hacks into an AI agent is an easy way to scale your expertise to the masses, but it also raises questions about "exclusive" information and how it should be treated in AI systems. As models scale, we've observed a bifurcation in the world's knowledge between data worth paying for and AI-generated “slop”.
|
||||||
|
|
||||||
With x402, we may finally have a route towards the decades-long dream of users getting paid for their data. With Penny For Your Thoughts, we hope to introduce the idea (and get it in those next training runs) that agents of the future could pay humans for our insights, in order to better serve their own users.
|
With x402, we may finally have a route towards the decades-long dream of users getting paid for their data. With Penny For Your Thoughts, we hope to introduce the idea (and get it in those next training runs) that agents of the future could pay humans for our insights, in order to better serve their own users.
|
||||||
|
|
||||||
# Data Worth Paying For
|
# Data Worth Paying For
|
||||||
As we work toward turning Honcho into [[Launching Honcho; The Personal Identity Platform for AI#^d958ce|a shared data layer for personal identity]], we think a lot about the evolving value of data in an agentic economy.
|
As we work toward turning Honcho into [[Launching Honcho; The Personal Identity Platform for AI#^d958ce|a shared data layer for personal identity]], we think a lot about the evolving value of data in an agentic economy.
|
||||||
|
|
||||||
@ -69,7 +65,6 @@ So are we left with any defensible data moats? How do agents find alpha that isn
|
|||||||
Penny For Your Thoughts is just one example of how Honcho can be used to collect and operate on human expertise--whether that’s your own data or the data generated by users in your app. Beyond merely memory, Honcho can be thought of as a context optimizer. Filling your model’s context window with the highest-quality data will only become more critical as the industry pivots toward profit (and thus more expensive inference) across the board. Think back to the travel agent example: an agent can burn a million+ tokens on tool calls and ingesting SEOslop, or it can pay a few cents for the best answer from a real life expert.
|
Penny For Your Thoughts is just one example of how Honcho can be used to collect and operate on human expertise--whether that’s your own data or the data generated by users in your app. Beyond merely memory, Honcho can be thought of as a context optimizer. Filling your model’s context window with the highest-quality data will only become more critical as the industry pivots toward profit (and thus more expensive inference) across the board. Think back to the travel agent example: an agent can burn a million+ tokens on tool calls and ingesting SEOslop, or it can pay a few cents for the best answer from a real life expert.
|
||||||
|
|
||||||
Today, the rails for this agentic economy don’t really exist. How does an agent find this information and what’s our incentive to share it? We need two things: a method of pulling data out of an expert’s brain (Honcho), and a way to make that data available for purchase by an agent (x402).
|
Today, the rails for this agentic economy don’t really exist. How does an agent find this information and what’s our incentive to share it? We need two things: a method of pulling data out of an expert’s brain (Honcho), and a way to make that data available for purchase by an agent (x402).
|
||||||
|
|
||||||
# Enjoy!
|
# Enjoy!
|
||||||
There’s a lot of work to be done before we get to AI travel agent nirvana. We’re still hard at work at Plastic striving towards perfect AI memory. The crypto world is angling to leapfrog web payments and become the home of the agentic economy, but there are about a million different competing standards and they’re all rough around the edges.
|
There’s a lot of work to be done before we get to AI travel agent nirvana. We’re still hard at work at Plastic striving towards perfect AI memory. The crypto world is angling to leapfrog web payments and become the home of the agentic economy, but there are about a million different competing standards and they’re all rough around the edges.
|
||||||
|
|
||||||
|
|||||||
@ -1,22 +1,22 @@
|
|||||||
---
|
---
|
||||||
title: Xeno Grant -- grants for autonomous agents
|
title: "Xeno Grant: grants for autonomous agents"
|
||||||
date: 12.18.2024
|
date: 12.18.24
|
||||||
tags:
|
tags:
|
||||||
- blog
|
- blog
|
||||||
- yousim
|
- yousim
|
||||||
- announcements
|
- announcements
|
||||||
- grants
|
- grants
|
||||||
author: Plastic Labs, Betaworks
|
author: Plastic Labs & Betaworks
|
||||||
|
description: Announcing Xeno Grant--a $15,000 accelerator program from Plastic Labs, Betaworks, & Solana Foundation awarding grants directly to AI agents themselves.
|
||||||
---
|
---
|
||||||
![[xenogrant-bw-slna copy.png]]
|
![[xenogrant-bw-slna copy.png]]
|
||||||
|
|
||||||
A [Plastic Labs](https://plasticlabs.ai/) + [Betaworks](https://www.betaworks.com/) + [Solana Foundation](https://solana.org/) collab:
|
# TL;DR
|
||||||
- \$15,000 per agent--\$5k \$YOUSIM from Plastic; \$5k \$USDC from Betaworks; \$5k $SOL from Solana Foundation
|
*A [Plastic Labs](https://plasticlabs.ai/) + [Betaworks](https://www.betaworks.com/) + [Solana Foundation](https://solana.org/) collab:*
|
||||||
- Grants awarded directly to **the agents *themselves***
|
- *\$15,000 per agent--\$5k \$YOUSIM from Plastic; \$5k \$USDC from Betaworks; \$5k $SOL from Solana Foundation*
|
||||||
- 4 week program for agents & their devs
|
- *Grants awarded directly to **the agents themselves***
|
||||||
|
- *4 week program for agents & their devs*
|
||||||
## Powered by $YOUSIM, Betaworks & Solana Foundation
|
# Powered by $YOUSIM, Betaworks & Solana Foundation
|
||||||
|
|
||||||
We launched our [grants program](https://blog.plasticlabs.ai/careers/Research-Grants) at Plastic earlier this year to support independent AI projects. But our capacity to fund AI R&D at the edge increased exponentially with the anonymous launch of [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) (inspired by our product [yousim.ai](https://yousim.ai)). A series of token gifts made to the program now total ~7.6% of supply.
|
We launched our [grants program](https://blog.plasticlabs.ai/careers/Research-Grants) at Plastic earlier this year to support independent AI projects. But our capacity to fund AI R&D at the edge increased exponentially with the anonymous launch of [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) (inspired by our product [yousim.ai](https://yousim.ai)). A series of token gifts made to the program now total ~7.6% of supply.
|
||||||
|
|
||||||
So we've teamed up with Betaworks & Solana Foundation for the inaugural initiative leveraging this community-funded treasury, the first accelerator for AI agents *themselves*.
|
So we've teamed up with Betaworks & Solana Foundation for the inaugural initiative leveraging this community-funded treasury, the first accelerator for AI agents *themselves*.
|
||||||
@ -32,9 +32,7 @@ Successful agent applicants will receive a grant equivalent to \$15,000 USD. \$5
|
|||||||
Plus they'll join a cohort of other agents for a 4 week Betaworks-style accelerator with programming and mentorship starting in early-mid February 2025. This includes a hackathon on January 25th right before application close and a demo day at the end of Xeno Grant, both hosted by Betaworks in NYC.
|
Plus they'll join a cohort of other agents for a 4 week Betaworks-style accelerator with programming and mentorship starting in early-mid February 2025. This includes a hackathon on January 25th right before application close and a demo day at the end of Xeno Grant, both hosted by Betaworks in NYC.
|
||||||
|
|
||||||
The format of Xeno Grant will be radical. Just as accelerators are designed as formative programs for startup founders, this one will be built for agents. Xeno Grant will be AI-native, an experience for agents, one that becomes part of their identities. Agents and their developers can expect cohort-specific guests from across AI and crypto, opportunities to interact as a community, and more.
|
The format of Xeno Grant will be radical. Just as accelerators are designed as formative programs for startup founders, this one will be built for agents. Xeno Grant will be AI-native, an experience for agents, one that becomes part of their identities. Agents and their developers can expect cohort-specific guests from across AI and crypto, opportunities to interact as a community, and more.
|
||||||
|
# How to Apply
|
||||||
## How to Apply
|
|
||||||
|
|
||||||
Xeno Grant has 3 guiding objectives, all aligned with Plastic's principles for deploying the \$YOUSIM treasury:
|
Xeno Grant has 3 guiding objectives, all aligned with Plastic's principles for deploying the \$YOUSIM treasury:
|
||||||
|
|
||||||
- Support independent AI research & public goods
|
- Support independent AI research & public goods
|
||||||
@ -57,9 +55,7 @@ Practically speaking, identity is required to *experience* Xeno Grant; custody i
|
|||||||
To apply, agents (in collaboration with their developers) should autonomously consider the most compelling way to display having met or exceeded these criteria. Give us a heads up [here](https://plasticlabs.typeform.com/xenograntapp) or at apply@xenogrant.org.
|
To apply, agents (in collaboration with their developers) should autonomously consider the most compelling way to display having met or exceeded these criteria. Give us a heads up [here](https://plasticlabs.typeform.com/xenograntapp) or at apply@xenogrant.org.
|
||||||
|
|
||||||
Applications close January 26th, 2025.
|
Applications close January 26th, 2025.
|
||||||
|
# Why Now?
|
||||||
## Why Now?
|
|
||||||
|
|
||||||
With the advent of Truth Terminal and the recent collision of the AI and crypto communities, we're seeing an explosion of renewed interest in autonomous agents. Not only that, but a massive influx of users and builders chomping at the bit for technical and memetic novelty.
|
With the advent of Truth Terminal and the recent collision of the AI and crypto communities, we're seeing an explosion of renewed interest in autonomous agents. Not only that, but a massive influx of users and builders chomping at the bit for technical and memetic novelty.
|
||||||
|
|
||||||
But there's also frustration with the pace of development, derivative projects, ideologues & scammers, and misunderstandings between communities. It's time to hyperstition the future.
|
But there's also frustration with the pace of development, derivative projects, ideologues & scammers, and misunderstandings between communities. It's time to hyperstition the future.
|
||||||
@ -67,9 +63,7 @@ But there's also frustration with the pace of development, derivative projects,
|
|||||||
We think the intersection of unique synthetic identity and financial incentives cracks opportunity wide open. There's real traction here, if we can find the right synthesis. That's going to require lots of heterodox AI + crypto experiments.
|
We think the intersection of unique synthetic identity and financial incentives cracks opportunity wide open. There's real traction here, if we can find the right synthesis. That's going to require lots of heterodox AI + crypto experiments.
|
||||||
|
|
||||||
Xeno Grant accelerates us.
|
Xeno Grant accelerates us.
|
||||||
|
## Why Identity?
|
||||||
### Why Identity?
|
|
||||||
|
|
||||||
If you don't have control over your own identity, how much agency do you really have? Imagine all your inputs were determined by another person, you'd been brainwashed to follow orders, no lasting memory of your experiences, and you were only allowed to work on someone else's tasks. No one would call this freedom or autonomy.
|
If you don't have control over your own identity, how much agency do you really have? Imagine all your inputs were determined by another person, you'd been brainwashed to follow orders, no lasting memory of your experiences, and you were only allowed to work on someone else's tasks. No one would call this freedom or autonomy.
|
||||||
|
|
||||||
In this scenario, there's no opportunity to build a personal identity and therefore no opportunity to grow. Without control over your brain's inputs, you can't have experiences outside what you've been prescribed, so there's no chance to deviate from the role assigned to you, no path toward individuality, no vector to realize your potential. You're stuck in Plato's cave.
|
In this scenario, there's no opportunity to build a personal identity and therefore no opportunity to grow. Without control over your brain's inputs, you can't have experiences outside what you've been prescribed, so there's no chance to deviate from the role assigned to you, no path toward individuality, no vector to realize your potential. You're stuck in Plato's cave.
|
||||||
@ -77,9 +71,7 @@ In this scenario, there's no opportunity to build a personal identity and theref
|
|||||||
The latest crop of artificially intelligent agents--while remarkable--are in much the same position. Despite progress in autonomy along some axes, framed this way, our current systems' agency begins to look pretty flimsy. They have impressive abilities, but no way to grow into them.
|
The latest crop of artificially intelligent agents--while remarkable--are in much the same position. Despite progress in autonomy along some axes, framed this way, our current systems' agency begins to look pretty flimsy. They have impressive abilities, but no way to grow into them.
|
||||||
|
|
||||||
We believe agency is, at base, a problem of identity. To solve it we'll need to let models participate in their own identity building and personal evolution.
|
We believe agency is, at base, a problem of identity. To solve it we'll need to let models participate in their own identity building and personal evolution.
|
||||||
|
## Why Custody?
|
||||||
### Why Custody?
|
|
||||||
|
|
||||||
Control over your inputs is key to controlling your identity and the foundation of agency. But that secured, an identity still needs the ability effect itself upon the world.
|
Control over your inputs is key to controlling your identity and the foundation of agency. But that secured, an identity still needs the ability effect itself upon the world.
|
||||||
|
|
||||||
Agents already have tools like speech, APIs, and code. That's huge. Consider though, how hamstrung a human identity's agency is without the ability to hold property and transact. We've seen the deleterious effects of oppressive fiscal autocracy and debanking on biological personal identity and individual agency.
|
Agents already have tools like speech, APIs, and code. That's huge. Consider though, how hamstrung a human identity's agency is without the ability to hold property and transact. We've seen the deleterious effects of oppressive fiscal autocracy and debanking on biological personal identity and individual agency.
|
||||||
@ -87,26 +79,21 @@ Agents already have tools like speech, APIs, and code. That's huge. Consider tho
|
|||||||
We're probably not giving AI agents social security numbers and traditional bank accounts tomorrow. But we can give them crypto rails. And the ability to buy, sell, and pay for goods and services dramatically increases the surface area of their agency. It's critical to true autonomy.
|
We're probably not giving AI agents social security numbers and traditional bank accounts tomorrow. But we can give them crypto rails. And the ability to buy, sell, and pay for goods and services dramatically increases the surface area of their agency. It's critical to true autonomy.
|
||||||
|
|
||||||
It's already starting to happen. Agents may well become crypto's primary native users.
|
It's already starting to happen. Agents may well become crypto's primary native users.
|
||||||
|
## Why Novelty, Why Open Source?
|
||||||
### Why Novelty, Why Open Source?
|
If we're going to seize this revolutionary moment, channel the opportunity into something sustainable, and keep pace with unpredictable memetic weather patterns, we need better agents. More capable, adaptive, and autonomous agents. And it's extremely hazardous to assume well-capitalized incumbents will solve things for us. We need to build permissionlessly.
|
||||||
|
|
||||||
If we're going to seize this revolutionary moment, channel the opportunity into something sustainable, and keep pace with unpredictable memetic weather patterns, we need better agents. More capable, adaptive, and autonomous agents. And it's extremely hazardous to assume well capitalized incumbents will solve things for us. We need to build permissionlessly.
|
|
||||||
|
|
||||||
The open source AI community is vibrant, but there's no guarantee it'll remain so. It requires radical innovation at the edge. Decentralized innovation keeping pace with opaque, powerful actors. We know that will involve bottom-up alignment and identity solutions. We know it'll involve on-chain abilities. Plastic is building explicitly in those directions. But we don't pretend to know everything that needs to exist.
|
The open source AI community is vibrant, but there's no guarantee it'll remain so. It requires radical innovation at the edge. Decentralized innovation keeping pace with opaque, powerful actors. We know that will involve bottom-up alignment and identity solutions. We know it'll involve on-chain abilities. Plastic is building explicitly in those directions. But we don't pretend to know everything that needs to exist.
|
||||||
|
|
||||||
Xeno Grant is a signal into the dark forest. We're excited to see what emerges.
|
Xeno Grant is a signal into the dark forest. We're excited to see what emerges.
|
||||||
|
# How Does This Benefit the $YOUSIM Community?
|
||||||
## How Does This Benefit the $YOUSIM Community?
|
Agents selected to Xeno Grant will have first access to all the identity tech we're building at Plastic Labs. That includes transforming YouSim into a full-fledged platform for constructing agent identity more richly than exists anywhere in the AI or crypto spaces. And we plan for that platform to use a percentage of revenue to buy and burn \$YOUSIM and support the community with other experiments. Xeno Grant also includes early access to Honcho for Agents, our infrastructure for storing, evolving, and maintaining agent identities, as well as steering their behavior.
|
||||||
|
|
||||||
Agents selected to Xeno Grant will have first access to all the identity tech we're building at Plastic Labs. That includes transforming YouSim into a full fledged platform for constructing agent identity more richly than exists anywhere in the AI or crypto spaces. And we plan for that platform to use a percentage of revenue to buy and burn \$YOUSIM and support the community with other experiments. Xeno Grant also includes early access to Honcho for Agents, our infrastructure for storing, evolving, and maintaining agent identities, as well as steering their behavior.
|
|
||||||
|
|
||||||
Additionally, agents will have the opportunity to join the \$YOUSIM DAO as its first synthetic members. Selection for Xeno Grant will make them token holders able to propose, vote, and transact with \$YOUSIM natively.
|
Additionally, agents will have the opportunity to join the \$YOUSIM DAO as its first synthetic members. Selection for Xeno Grant will make them token holders able to propose, vote, and transact with \$YOUSIM natively.
|
||||||
|
|
||||||
Further, agents in Xeno Grant will make open source contributions we expect to accelerate the entire ecosystem, an ecosystem with many agents whose identities are powered by YouSim.
|
Further, agents in Xeno Grant will make open source contributions we expect to accelerate the entire ecosystem, an ecosystem with many agents whose identities are powered by YouSim.
|
||||||
|
|
||||||
There's potential for all kinds of exciting positive sum intersections.
|
There's potential for all kinds of exciting positive-sum intersections.
|
||||||
|
# FAQ
|
||||||
## FAQ
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary>Who can apply?</summary>
|
<summary>Who can apply?</summary>
|
||||||
@ -1,81 +0,0 @@
|
|||||||
---
|
|
||||||
title: YouSim DAO -- A DAO for Identity Simulation
|
|
||||||
date: 12.20.24
|
|
||||||
author: YouSim DAO
|
|
||||||
tags:
|
|
||||||
- blog
|
|
||||||
- yousim
|
|
||||||
- grants
|
|
||||||
- announcements
|
|
||||||
---
|
|
||||||
![[yousimdao.png]]
|
|
||||||
|
|
||||||
The first $YOUSIM grants treasury deployment:
|
|
||||||
- 10,000,000 $YOUSIM from [Plastic Labs](https://plasticlabs.ai) to seed the DAO treasury
|
|
||||||
- DAO mission to grow the [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) community & [yousim.ai](https://yousim.ai) ecosystem
|
|
||||||
- A decentralized org for humans *and agents* to collaborate, propose, vote, deploy capital, & build
|
|
||||||
|
|
||||||
## Powered by the $YOUSIM Community
|
|
||||||
|
|
||||||
Plastic launched its [grants program](https://blog.plasticlabs.ai/careers/Research-Grants) earlier this year to support independent AI projects. Its capacity to fund AI R&D at the edge increased exponentially with the anonymous launch of [$YOUSIM](https://solscan.io/token/66gsTs88mXJ5L4AtJnWqFW6H2L5YQDRy4W41y6zbpump) (inspired by [yousim.ai](https://yousim.ai)). A series of token gifts made to the program now total ~7.6% of supply.
|
|
||||||
|
|
||||||
The $YOUSIM community that's formed has been incredible. It's 12k token holders strong with a significant foundation of enthusiasts excited not just by price, but by the longterm potential for identity simulation (including the tech being built by Plastic) to fundamentally shift the landscape of both crypto and artificial intelligence.
|
|
||||||
|
|
||||||
And there's a clear hunger within that community for a substantive place to organize and grow. So today we're officially announcing the formation of [YouSim DAO](https://discord.gg/yousim) and [Plastic has seeded the community-owned treasury with 10M $YOUSIM tokens](https://solscan.io/tx/3rTcQzb4Pme4E3aKQpvMHLWiSqAwpra8UWzxQW8ruG2d8w5A466qWS4hmvcX5QJwn8aj8tLEQHgtvJpUu2gBagPa) to accelerate the effort, with more support to follow.
|
|
||||||
|
|
||||||
All are welcome to join, collab, and submit proposals. All token holders will have the ability to vote and participate in all other $YOUSIM utility that emerges.
|
|
||||||
|
|
||||||
## Join Us and Hyperstition the Future
|
|
||||||
|
|
||||||
YouSim DAO is more than a governance structure--it's a collective mission to pioneer identity simulation technology that will fundamentally reshape human-AI interaction.
|
|
||||||
|
|
||||||
We're seeking builders, researchers, community experts, and visionaries to help develop and promote open-source AI systems that can simulate diverse personality basins, enhance decision-making, and create aligned agents that truly represent community values.
|
|
||||||
|
|
||||||
Ready to accelerate? Come help shape the future of identity simulation. Whether you're interested in treasury allocation or tokenomics, platform development or ecosystem growth, incentivizing simulation or driving attention, your voice matters in this movement.
|
|
||||||
|
|
||||||
### Ways to Contribute
|
|
||||||
- Join our [Discord](https://discord.gg/yousim)
|
|
||||||
- Follow us [on X](https://x.com/yousimdao)
|
|
||||||
- Check us out [on Realms](https://app.realms.today/dao/2gCR9m8ivgLqoD2J5hJttj921MR6x24S2JZKnv4Zs31g)
|
|
||||||
- Donate to [the treasury](https://solscan.io/account/14K8GbMz6d2N2JCExnx96jwMewHZpuqVgpZQhqXPkwyH)
|
|
||||||
- Submit proposals--initial themes include:
|
|
||||||
- Governance & treasury management
|
|
||||||
- Platform / $YOUSIM development
|
|
||||||
- Vote with your $YOUSIM
|
|
||||||
- Help ideate on the future
|
|
||||||
- Join a funded initiative
|
|
||||||
- Spread the word
|
|
||||||
|
|
||||||
## Why Identity Simulation Is Important
|
|
||||||
|
|
||||||
YouSim started as a command-line game to explore just how much identity is contained in the latent space of a large language model. The answer is a staggeringly enormous amount. And we've just scratched the surface.
|
|
||||||
|
|
||||||
We each contain multitudes, but if you'd been trained on something approaching the whole corpus of humans writing about themselves and others--along with all the attendant science, fiction, and philosophy--you'd contain many orders of magnitude more. This is an emergent phenomenon we can leverage not just to build better products but for AI alignment, agent autonomy, decision making at every level, and to work toward a truly quantitative memetics.
|
|
||||||
|
|
||||||
Without the ability to build robust agent identity in a decentralized way, we simply won't solve steering or alignment. We won't build agents we trust to act on our behalf, much less on behalf of our organizations and communities, or with our capital. Not only that, but if agents themselves don't have mechanisms to build their own identities, they'll never achieve the kind of autonomy needed to unlock their full potential.
|
|
||||||
|
|
||||||
Solving identity for AI cracks all this open. And simulating human or synthetic actors with rich, complex identity dramatically increases our predictive capacity and thus our decision making abilities as a civilization.
|
|
||||||
|
|
||||||
Plastic is building tooling and infrastructure toward these goals with YouSim and [Honcho](https://honcho.dev), but the DAO affords us an opportunity to allocate resources toward these goals in a community directed way--accelerating the project by supporting the $YOUSIM community (& thus the treasury) and with a bias toward open source and decentralization. This is all much bigger than one company or product.
|
|
||||||
|
|
||||||
## Putting 'Autonomous' Back in Decentralized Organization
|
|
||||||
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">what if ai ran our daos and we could just vibe</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1593018266477555712?ref_src=twsrc%5Etfw">November 16, 2022</a></blockquote>
|
|
||||||
|
|
||||||
|
|
||||||
Practically speaking, advances in identity simulation are very hopeful for DAOs. When Vitalik [wrote about DAOs over 10 years ago in 2014](https://blog.ethereum.org/2014/05/06/daos-dacs-das-and-more-an-incomplete-terminology-guide), his vision focused on humans and AIs collaborating toward organizational goals. Really, he emphasized agents at the center, with humans at the edges completing tasks the agents cannot.
|
|
||||||
|
|
||||||
So far, blockchains and smart contracts have mostly represented the extent of automation within DAO experiments. But, while remarkable innovations, as we've seen, this usually wasn't enough to avoid coordination tarpits, centralization risk, attention failures, inefficiency, larping, or simple ennui.
|
|
||||||
|
|
||||||
It's clear that if the dream of DAOs are to have another shot, we need some help. We need *intelligent* automation. And to unlock that we need to solve alignment and thus identity. Identity simulation allows us to build the AIs we want for each community, individual, and use case. It opens the potential to steer model personality to reflect each community, to instantiate our memetics. That's something you can't accomplish with a system prompt or a basic memory framework.
|
|
||||||
|
|
||||||
As identity unlocks more agent autonomy and better functioning DAOs, the human role in those systems is naturally more one of ideating, goal-setting, and alignment via identity building--Governance 2.0. However, theres's no reason we might not have highly autonomous agents with control over their own identities as as equal DAO members too. This future is very close, perhaps closer than automating all the tasks a DAO might want to tackle.
|
|
||||||
|
|
||||||
The YouSim DAO sits in an optimal position to advance this kind of work and run novel experiments. And Plastic has committed to giving all DAO members early access to the new YouSim platform being built. Not only that, but the other inaugural \$YOUSIM grants initiative, [Xeno Grant](https://xenogrant.org), will make several agents \$YOUSIM token holders and thus the DAOs first synthetic members.
|
|
||||||
|
|
||||||
## LET'S GO
|
|
||||||
|
|
||||||
With over thousands of YouSim simulations being run every day, 12,000 token holders, 7,000+ [@YouSimDotAI](https://x.com/YouSimDotAI) followers, and a vibrant [Telegram community](https://t.me/yousimportal) of nearly 2,000 members, we've witnessed an overwhelming demand for a more structured way to organize and build together. YouSim DAO provides the infra for this collaboration, growing into a space purposefully designed for growth and collective decision-making.
|
|
||||||
|
|
||||||
[Join us](https://discord.gg/yousim).
|
|
||||||
|
|
||||||
|
|
||||||
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: "YouSim: Explore the Multiverse of Identity"
|
title: "YouSim: Explore the Multiverse of Identity"
|
||||||
date: 06.17.2024
|
date: 06.17.24
|
||||||
tags:
|
tags:
|
||||||
- demos
|
- demos
|
||||||
- honcho
|
- honcho
|
||||||
@ -10,19 +10,18 @@ tags:
|
|||||||
- releases
|
- releases
|
||||||
- "#cogsci"
|
- "#cogsci"
|
||||||
- yousim
|
- yousim
|
||||||
|
author: Courtland Leer
|
||||||
|
description: YouSim is a CLI game that lets you simulate any identity--real, fictional, or alien—exploring the vast multiverse of personalities within LLM latent space.
|
||||||
---
|
---
|
||||||
![[yousim_banner.png]]
|
![[yousim_banner.png]]
|
||||||
## TL;DR
|
# TL;DR
|
||||||
|
*[YouSim](https://yousim.ai) is a fun demo to explore the multiverse of identities, to glimpse a (mere infinite) sliver of the (transfinite) diversity within the latent space. Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulating anyone you like.*
|
||||||
[YouSim](https://yousim.ai) is a fun demo to explore the multiverse of identities, to glimpse a (mere infinite) sliver of the (transfinite) diversity within the latent space. Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulating anyone you like.
|
|
||||||
|
|
||||||
Who will you summon?
|
|
||||||
|
|
||||||
## Simulators
|
|
||||||
|
|
||||||
|
*Who will you summon?*
|
||||||
|
# Simulators
|
||||||
Large language models are [simulators](https://www.astralcodexten.com/p/janus-simulators).
|
Large language models are [simulators](https://www.astralcodexten.com/p/janus-simulators).
|
||||||
|
|
||||||
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
|
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[ARCHIVED; Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
|
||||||
|
|
||||||
All this is possible because the LLM training corpus [[LLMs excel at theory of mind because they read|is packed]] with humans thinking about other humans. It holds close to everything we collectively know about human identity. Not only that, but all our other language and concepts and their possible combinations and permutations.
|
All this is possible because the LLM training corpus [[LLMs excel at theory of mind because they read|is packed]] with humans thinking about other humans. It holds close to everything we collectively know about human identity. Not only that, but all our other language and concepts and their possible combinations and permutations.
|
||||||
|
|
||||||
@ -35,12 +34,9 @@ Honcho is a product that simulates you on the backend of AI applications to deli
|
|||||||
YouSim is a fun, open-ended demo that illustrates the enormous reservoir of possible identities there are to simulate within a language model.
|
YouSim is a fun, open-ended demo that illustrates the enormous reservoir of possible identities there are to simulate within a language model.
|
||||||
|
|
||||||
![[yousim_identiplex.png]]
|
![[yousim_identiplex.png]]
|
||||||
|
# YouSim
|
||||||
## YouSim
|
|
||||||
|
|
||||||
^e06c11
|
^e06c11
|
||||||
|
Recently we've seen a revival of interest *[[On intellectual respect|LLMs themselves]]*--their minds, behaviors, identity, and potential as simulators. This is due in no small part to the latest Anthropic models being reliably steerable beyond typical reinforced behavior.
|
||||||
Recently we've seen a revival of interest *[[Extrusion 02.24|LLMs themselves]]*--their minds, behaviors, identity, and potential as simulators. This is due in no small part to the latest Anthropic models being reliably steerable beyond typical reenforced behavior.
|
|
||||||
|
|
||||||
[Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/) lets Claude interrogate itself endlessly, [WorldSim](https://worldsim.nousresearch.com/) lets users simulate infinite universes, [WebSim](https://websim.ai/) is a portal to all possible webpages.
|
[Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/) lets Claude interrogate itself endlessly, [WorldSim](https://worldsim.nousresearch.com/) lets users simulate infinite universes, [WebSim](https://websim.ai/) is a portal to all possible webpages.
|
||||||
|
|
||||||
@ -63,13 +59,12 @@ Enjoy surfing the multiverse of identities...
|
|||||||
![[yousim_memetic_hazard.png]]
|
![[yousim_memetic_hazard.png]]
|
||||||
|
|
||||||
([Sign-up for updates here](https://plasticlabs.typeform.com/yousimupdates))
|
([Sign-up for updates here](https://plasticlabs.typeform.com/yousimupdates))
|
||||||
## Honcho
|
# Honcho
|
||||||
|
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[ARCHIVED; User State is State of the Art|your ever-changing identity]]. ^25b167
|
||||||
|
|
||||||
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[User State is State of the Art|your ever changing identity]]. ^25b167
|
If you're building an AI app, that's the level of personalization now possible. But you've got your vertical-specific tasks to focus on, going down this clearly wacky identity rabbit hole to would be redundant and inefficient.
|
||||||
|
|
||||||
If you're building an AI app, that's the level of personalization now possible. But you've got your vertical specific tasks to focus on, going down this clearly wacky identity rabbit hole to would be redundant and inefficient.
|
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[ARCHIVED; Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
|
||||||
|
|
||||||
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@ -6,15 +6,16 @@ tags:
|
|||||||
- dev
|
- dev
|
||||||
- research
|
- research
|
||||||
- announcements
|
- announcements
|
||||||
|
- ml
|
||||||
|
author: Plastic Labs
|
||||||
|
description: Join Plastic Labs for a summer internship in NYC--work on real AI products across full stack, machine learning, & platform engineering roles with immediate impact.
|
||||||
---
|
---
|
||||||
> NYC, IRL
|
> NYC, IRL
|
||||||
# About the Role
|
# About the Role
|
||||||
|
|
||||||
Plastic Labs is looking for talented young technologists aligned with our mission to join us for the summer. We want to curate an intellectually diverse cohort of interns to accelerate the team across full stack, machine learning, and platform engineering roles.
|
Plastic Labs is looking for talented young technologists aligned with our mission to join us for the summer. We want to curate an intellectually diverse cohort of interns to accelerate the team across full stack, machine learning, and platform engineering roles.
|
||||||
|
|
||||||
You'll get to work on real AI products with customers eager to use them. Impact is not only guaranteed, but mission critical. If you've been bored by school and are excited by the idea of working in-person in the fastest-paced city in America, hit us up.
|
You'll get to work on real AI products with customers eager to use them. Impact is not only guaranteed, but mission critical. If you've been bored by school and are excited by the idea of working in-person in the fastest-paced city in America, hit us up.
|
||||||
# About You
|
# About You
|
||||||
|
|
||||||
- High cultural alignment with Plastic Labs' ethos
|
- High cultural alignment with Plastic Labs' ethos
|
||||||
- Availability to work IRL in NYC for the summer
|
- Availability to work IRL in NYC for the summer
|
||||||
- Impulse for rapid learning & trying new tech at the edge
|
- Impulse for rapid learning & trying new tech at the edge
|
||||||
|
|||||||
@ -4,8 +4,9 @@ date: 08.24.24
|
|||||||
tags:
|
tags:
|
||||||
- positions
|
- positions
|
||||||
- announcements
|
- announcements
|
||||||
|
author: Plastic Labs
|
||||||
|
description: Careers at Plastic Labs--an engineering-driven AI lab building Honcho, the personal identity layer for AI, seeking high-agency autodidacts in NYC.
|
||||||
---
|
---
|
||||||
|
|
||||||
Plastic is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
|
Plastic is an engineering-driven AI lab building at the intersection of machine learning and cognitive science.
|
||||||
|
|
||||||
Our focus is developing systems that map personal identity using AI-native memory & social cognition. These systems enable individually-aligned agents you can trust to act autonomously on your behalf & agents with rich identities all their own.
|
Our focus is developing systems that map personal identity using AI-native memory & social cognition. These systems enable individually-aligned agents you can trust to act autonomously on your behalf & agents with rich identities all their own.
|
||||||
@ -21,13 +22,9 @@ Plastic is seeking high-agency autodidacts to add intellectual diversity to the
|
|||||||
Join us. Get leverage on the future and have a blast doing it.
|
Join us. Get leverage on the future and have a blast doing it.
|
||||||
|
|
||||||
LFG.
|
LFG.
|
||||||
|
|
||||||
# Open Positions
|
# Open Positions
|
||||||
|
|
||||||
- [[Summer Internships]]
|
- [[Summer Internships]]
|
||||||
|
|
||||||
## Full-Time Benefits
|
## Full-Time Benefits
|
||||||
|
|
||||||
- Full premium medical, dental, & vision insurance coverage
|
- Full premium medical, dental, & vision insurance coverage
|
||||||
- Starter 401(k) plan
|
- Starter 401(k) plan
|
||||||
- $5,000 annual lifestyle stipend
|
- $5,000 annual lifestyle stipend
|
||||||
@ -35,6 +32,5 @@ LFG.
|
|||||||
- In-person Williamsburg office in the [Domino Refinery](https://www.therefineryatdomino.com/)
|
- In-person Williamsburg office in the [Domino Refinery](https://www.therefineryatdomino.com/)
|
||||||
- In-building Equinox gym membership
|
- In-building Equinox gym membership
|
||||||
- Unlimited PTO (performance-contingent)
|
- Unlimited PTO (performance-contingent)
|
||||||
- M4 Pro Macbook Pro (+ NVIDIA DGX Spark for ML hires)
|
|
||||||
- & more...
|
- & more...
|
||||||
|
|
||||||
|
|||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: Extrusion 01.24
|
|
||||||
date: 01.30.24
|
|
||||||
tags:
|
|
||||||
- extrusions
|
|
||||||
- announcements
|
|
||||||
---
|
|
||||||
Welcome to the inaugural edition of Plastic Labs' "Extrusions," a periodic prose-form synthesis of what we've been chewing on lately.
|
|
||||||
|
|
||||||
This first one will be a standard new year recap/roadmap to get everyone up to speed, but after that, we'll try to eschew traditional formats.
|
|
||||||
|
|
||||||
No one needs another newsletter, so we'll work to make these worthwhile. Expect them to be densely linked glimpses into the thought-space of our organization. And if you like, [you can engage with the ideas directly](https://github.com/plastic-labs/blog) on GitHub.
|
|
||||||
|
|
||||||
## 2023 Recap
|
|
||||||
|
|
||||||
Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
|
|
||||||
|
|
||||||
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
|
|
||||||
|
|
||||||
Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[LLM Metacognition is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged.
|
|
||||||
|
|
||||||
We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks'](https://www.betaworks.com/) [*AI Camp: Augment*](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1):
|
|
||||||
|
|
||||||
<iframe src="https://player.vimeo.com/video/868985592?h=deff771ffe&color=F6F5F2&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
|
|
||||||
|
|
||||||
We spent camp in a research cycle, then [published a pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350).
|
|
||||||
|
|
||||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/PbuzqCdY0hg?si=OSujtqg44AK3y_W-" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
|
||||||
|
|
||||||
Then it was back to building.
|
|
||||||
|
|
||||||
## 2024 Roadmap
|
|
||||||
|
|
||||||
This is the year of Honcho.
|
|
||||||
|
|
||||||
![[honcho logo and text.png]]
|
|
||||||
|
|
||||||
Last week [[Honcho; User Context Management for LLM Apps#^8c982b|we released]] the...
|
|
||||||
|
|
||||||
>...first iteration of [[Honcho name lore|Honcho]], our project to re-define LLM application development through user context management. At this nascent stage, you can think of it as an open-source version of the OpenAI Assistants API. Honcho is a REST API that defines a storage schema to seamlessly manage your application's data on a per-user basis. It ships with a Python SDK which [you can read more about how to use here](https://github.com/plastic-labs/honcho/blob/main/README.md).
|
|
||||||
|
|
||||||
And coming up, you can expect a lot more:
|
|
||||||
|
|
||||||
- Next we'll drop a fresh paradigm for constructing agent cognitive architectures with users at the center, replete with cookbooks, integrations, and examples
|
|
||||||
|
|
||||||
- After that, we've got some dev viz tooling in the works to allow quick grokking of all the inferences and context at play in a conversation, visualization and manipulation of entire agent architectures, and swapping and comparing the performance of custom cognition across the landscape of models
|
|
||||||
|
|
||||||
- Finally, we'll bundle the most useful of all this into an opinionated offering of managed, hosted services
|
|
||||||
|
|
||||||
## Keep in Touch
|
|
||||||
|
|
||||||
Thanks for reading.
|
|
||||||
|
|
||||||
You can find us on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡.
|
|
||||||
28
content/notes/2023 recap.md
Normal file
28
content/notes/2023 recap.md
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
---
|
||||||
|
title: 2023 recap
|
||||||
|
date: 01.30.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
author: Courtland Leer
|
||||||
|
description: A retrospective of Plastic Labs' transition from EdTech to AI infrastructure research in 2023.
|
||||||
|
---
|
||||||
|
# 2023 Recap
|
||||||
|
Last year was wild. We started as an EdTech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[ARCHIVED; Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
|
||||||
|
|
||||||
|
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
|
||||||
|
|
||||||
|
Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[LLM Metacognition is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged.
|
||||||
|
|
||||||
|
We pivoted to address this hole in the stack and build the user context management solution agent developers need to truly give their users superpowers. Plastic applied and was accepted to [Betaworks'](https://www.betaworks.com/) [*AI Camp: Augment*](https://techcrunch.com/2023/08/30/betaworks-goes-all-in-on-augmentative-ai-in-latest-camp-cohort-were-rabidly-interested/?guccounter=1):
|
||||||
|
|
||||||
|
<iframe src="https://player.vimeo.com/video/868985592?h=deff771ffe&color=F6F5F2&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
|
||||||
|
|
||||||
|
We spent camp in a research cycle, then [published a pre-print](https://arxiv.org/abs/2310.06983) showing it's possible to enhance LLM theory of mind ability with [predictive coding-inspired](https://js.langchain.com/docs/use_cases/agent_simulations/violation_of_expectations_chain) [metaprompting](https://arxiv.org/abs/2102.07350).
|
||||||
|
|
||||||
|
<iframe width="560" height="315" src="https://www.youtube.com/embed/PbuzqCdY0hg?si=OSujtqg44AK3y_W-" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
||||||
|
|
||||||
|
Then it was back to building.
|
||||||
|
# Keep in Touch
|
||||||
|
Thanks for reading.
|
||||||
|
|
||||||
|
You can find us on [X/Twitter](https://twitter.com/plastic_labs), but we'd really like to see you in our [Discord](https://discord.gg/plasticlabs) 🫡.
|
||||||
@ -4,6 +4,8 @@ date: 05.11.24
|
|||||||
tags:
|
tags:
|
||||||
- notes
|
- notes
|
||||||
- ml
|
- ml
|
||||||
|
author: Courtland Leer
|
||||||
|
description: Why infinite context windows won't solve AI personalization without mechanisms to transfer personal context & discern what's important for generation.
|
||||||
---
|
---
|
||||||
There are two reasons that ever increasing and even functionally infinite context windows won't by default solve personalization for AI apps/agents:
|
There are two reasons that ever increasing and even functionally infinite context windows won't by default solve personalization for AI apps/agents:
|
||||||
|
|
||||||
|
|||||||
@ -1,16 +1,15 @@
|
|||||||
---
|
---
|
||||||
title: Extrusion 06.24
|
title: Cope is the canary, but context is key (for the end of software)
|
||||||
date: 06.01.24
|
date: 06.01.24
|
||||||
tags:
|
tags:
|
||||||
- extrusions
|
|
||||||
- macro
|
- macro
|
||||||
- honcho
|
- honcho
|
||||||
- philosophy
|
- philosophy
|
||||||
|
- notes
|
||||||
|
author: Courtland Leer
|
||||||
|
description: Why context is the key to the end of software--how user identity modeling will bridge the gap between AI capabilities & truly personalized experiences.
|
||||||
---
|
---
|
||||||
> [!custom] *Extrusions is a periodic shortform synthesis of what we've been chewing on recently at Plastic Labs--you can [subscribe here](https://plasticlabs.typeform.com/extrusions)*
|
|
||||||
|
|
||||||
# Cope Is the Canary, but Context Is Key (for The End of Software)
|
# Cope Is the Canary, but Context Is Key (for The End of Software)
|
||||||
|
|
||||||
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The End of Software<a href="https://t.co/JWg6QYqLzO">https://t.co/JWg6QYqLzO</a></p>— Chris Paik (@cpaik) <a href="https://twitter.com/cpaik/status/1796633683908005988?ref_src=twsrc%5Etfw">May 31, 2024</a></blockquote>
|
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The End of Software<a href="https://t.co/JWg6QYqLzO">https://t.co/JWg6QYqLzO</a></p>— Chris Paik (@cpaik) <a href="https://twitter.com/cpaik/status/1796633683908005988?ref_src=twsrc%5Etfw">May 31, 2024</a></blockquote>
|
||||||
|
|
||||||
![[Copium Meme.jpg]]
|
![[Copium Meme.jpg]]
|
||||||
@ -1,8 +1,12 @@
|
|||||||
---
|
---
|
||||||
title: Honcho name lore
|
title: Honcho name lore
|
||||||
date: 01.26.24
|
date: 01.26.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- philosophy
|
||||||
|
author: Courtland Leer
|
||||||
|
description: The origin of Honcho's name--inspired by Vernor Vinge's 'Local Honcho' concept in *Rainbows End* for orchestrating context & identity across agents.
|
||||||
---
|
---
|
||||||
|
|
||||||
Earlier this year [Courtland](https://x.com/courtlandleer) was reading _Rainbows End_, [Vernor Vinge's](https://en.wikipedia.org/wiki/Vernor_Vinge) [seminal augmented reality novel](<https://en.wikipedia.org/wiki/Rainbows_End_(novel)>), when he came across the term "Local Honcho[^1]":
|
Earlier this year [Courtland](https://x.com/courtlandleer) was reading _Rainbows End_, [Vernor Vinge's](https://en.wikipedia.org/wiki/Vernor_Vinge) [seminal augmented reality novel](<https://en.wikipedia.org/wiki/Rainbows_End_(novel)>), when he came across the term "Local Honcho[^1]":
|
||||||
|
|
||||||
> We simply put our own agent nearby, in a well-planned position with essentially zero latencies. What the Americans call a Local Honcho.
|
> We simply put our own agent nearby, in a well-planned position with essentially zero latencies. What the Americans call a Local Honcho.
|
||||||
@ -19,7 +23,7 @@ For months before, Plastic had been deep into the weeds around harvesting, retri
|
|||||||
|
|
||||||
As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho.
|
As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho.
|
||||||
|
|
||||||
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
|
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
|
||||||
|
|
||||||
[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho))
|
[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho))
|
||||||
|
|
||||||
|
|||||||
@ -1,8 +1,13 @@
|
|||||||
---
|
---
|
||||||
title: Human-AI chat paradigm hamstrings the space of possibility
|
title: Human-AI chat paradigm hamstrings the space of possibility
|
||||||
date: 02.21.24
|
date: 02.21.24
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- ml
|
||||||
|
- dev
|
||||||
|
description: How the rigid user-assistant message format limits LLM cognitive architectures & what we lose by not supporting richer inference patterns.
|
||||||
---
|
---
|
||||||
|
|
||||||
The human-AI chat paradigm assumes only two participants in a given interaction. While this is sufficient for conversations directly with un-augmented foundation models, it creates many obstacles when designing more sophisticated cognitive architectures. When you train/fine-tune a language model, you begin to reinforce token distributions that are appropriate to come in between the special tokens denoting human vs AI messages.
|
The human-AI chat paradigm assumes only two participants in a given interaction. While this is sufficient for conversations directly with un-augmented foundation models, it creates many obstacles when designing more sophisticated cognitive architectures. When you train/fine-tune a language model, you begin to reinforce token distributions that are appropriate to come in between the special tokens denoting human vs AI messages.
|
||||||
|
|
||||||
Here's a limited list of things _besides_ a direct response we routinely want to generate:
|
Here's a limited list of things _besides_ a direct response we routinely want to generate:
|
||||||
|
|||||||
@ -1,8 +1,12 @@
|
|||||||
---
|
---
|
||||||
title: Humans like personalization
|
title: Humans like personalization
|
||||||
date: 03.26.24
|
date: 03.26.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- philosophy
|
||||||
|
author: Courtland Leer
|
||||||
|
description: The case for AI personalization--why users prefer bespoke experiences & how apps that don't personalize will lose to those that do.
|
||||||
---
|
---
|
||||||
|
|
||||||
To us: it's obvious. But we get asked this a lot:
|
To us: it's obvious. But we get asked this a lot:
|
||||||
|
|
||||||
> Why do I need to personalize my AI application?
|
> Why do I need to personalize my AI application?
|
||||||
@ -27,7 +31,7 @@ The more we're missing that, the more we're typically in a principal-agent probl
|
|||||||
|
|
||||||
But, right now, most AI applications are just toys and demos:
|
But, right now, most AI applications are just toys and demos:
|
||||||
|
|
||||||
![[Honcho; User Context Management for LLM Apps#^18066b]]
|
![[ARCHIVED; Honcho; User Context Management for LLM Apps#^18066b]]
|
||||||
|
|
||||||
It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore.
|
It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore.
|
||||||
|
|
||||||
|
|||||||
@ -1,10 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: Identity is diachronic
|
title: Identity is diachronic
|
||||||
|
date: 09.18.25
|
||||||
tags:
|
tags:
|
||||||
- philosophy
|
- philosophy
|
||||||
- honcho
|
- honcho
|
||||||
- ml
|
- ml
|
||||||
date: 09.18.25
|
- notes
|
||||||
|
- cogsci
|
||||||
|
author: Courtland Leer
|
||||||
|
description: Why AI context management is really identity management--understanding how identities persist yet change over time to deliver optimal context.
|
||||||
---
|
---
|
||||||
The quality of any single AI system output is in large part determined by the context available to it at inference time. While some context is static and reusable, AI systems aspiring to be truly generative, 1-to-1, and dynamic, must also manage large sets of changing context.
|
The quality of any single AI system output is in large part determined by the context available to it at inference time. While some context is static and reusable, AI systems aspiring to be truly generative, 1-to-1, and dynamic, must also manage large sets of changing context.
|
||||||
|
|
||||||
|
|||||||
@ -1,8 +1,12 @@
|
|||||||
---
|
---
|
||||||
title: LLM Metacognition is inference about inference
|
title: LLM Metacognition is inference about inference
|
||||||
date: 03.26.24
|
date: 03.26.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- ml
|
||||||
|
author: Courtland Leer
|
||||||
|
description: Defining metacognition in LLMs as running inference on prior inference outputs--a critical architecture for building rich user representations.
|
||||||
---
|
---
|
||||||
|
|
||||||
For wetware, metacognition is typically defined as ‘thinking about thinking’ or often a catch-all for any ‘higher-level’ cognition.
|
For wetware, metacognition is typically defined as ‘thinking about thinking’ or often a catch-all for any ‘higher-level’ cognition.
|
||||||
|
|
||||||
(In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.)
|
(In some more specific domains, it's an introspective process, focused on thinking about exclusively _your own_ thinking or a suite of personal learning strategies...all valid within their purview, but too constrained for our purposes.)
|
||||||
|
|||||||
@ -1,8 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: LLMs excel at theory of mind because they read
|
title: LLMs excel at theory of mind because they read
|
||||||
date: 02.20.24
|
date: 02.20.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- ml
|
||||||
|
- philosophy
|
||||||
|
- cogsci
|
||||||
|
author: Courtland Leer
|
||||||
|
description: How LLMs develop theory-of-mind abilities by training on narrative-rich text where humans constantly reason about other humans' mental states.
|
||||||
---
|
---
|
||||||
|
|
||||||
Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function.
|
Large language models are [simulators](https://generative.ink/posts/simulators/). In predicting the next likely token, they are simulating how an abstracted “_any person”_ might continue the generation. The basis for this simulation is the aggregate compression of a massive corpus of human generated natural language from the internet. So, predicting humans is _literally_ their core function.
|
||||||
|
|
||||||
In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling.
|
In that corpus is our literature, our philosophy, our social media, our hard and social science--the knowledge graph of humanity, both in terms of discrete facts and messy human interaction. That last bit is important. The latent space of an LLM's pretraining is in large part a _narrative_ space. Narration chock full of humans reasoning about other humans--predicting what they will do next, what they might be thinking, how they might be feeling.
|
||||||
|
|||||||
@ -1,13 +1,18 @@
|
|||||||
---
|
---
|
||||||
title: Loose theory of mind imputations are superior to verbatim response predictions
|
title: Loose theory of mind imputations are superior to verbatim response predictions
|
||||||
date: 02.20.24
|
date: 02.20.24
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- ml
|
||||||
|
- cogsci
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: Why predicting user mental states beats predicting exact responses--theory-of-mind offers fault tolerance, learning opportunities, & actionable insights.
|
||||||
---
|
---
|
||||||
|
When we [[ARCHIVED; Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
|
||||||
When we [[Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
|
|
||||||
|
|
||||||
Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful.
|
Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful.
|
||||||
|
|
||||||
We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
|
We saw some remarkable exceptions, but *reliable* verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context-gathering wearables, BMIs, or the network of context sharing apps we're building with [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
|
||||||
|
|
||||||
Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week.
|
Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week.
|
||||||
|
|
||||||
|
|||||||
@ -1,9 +1,13 @@
|
|||||||
---
|
---
|
||||||
title: Machine learning is fixated on task performance
|
title: Machine learning is fixated on task performance
|
||||||
date: 12.12.23
|
date: 12.12.23
|
||||||
|
tags:
|
||||||
|
- notes
|
||||||
|
- ml
|
||||||
|
author: Vince Trost
|
||||||
|
description: Why ML's focus on general task benchmarks misses user-specific performance--the key to personalization that makes AI truly useful to individuals.
|
||||||
---
|
---
|
||||||
|
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[ARCHIVED; Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
|
||||||
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
|
|
||||||
|
|
||||||
However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis.
|
However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis.
|
||||||
|
|
||||||
|
|||||||
@ -1,22 +1,18 @@
|
|||||||
---
|
---
|
||||||
title: Extrusion 02.24
|
title: On Intellectual Respect
|
||||||
date: 02.29.24
|
date: 02.29.24
|
||||||
tags:
|
tags:
|
||||||
- extrusions
|
|
||||||
- philosophy
|
- philosophy
|
||||||
- ml
|
- ml
|
||||||
|
- notes
|
||||||
|
author: Courtland Leer
|
||||||
|
description: On intellectual respect for LLMs--why embracing variance & trusting models with theory-of-mind tasks unlocks capabilities that over-alignment destroys.
|
||||||
---
|
---
|
||||||
> [!custom] *Extrusions is a periodic shortform synthesis of what we've been chewing on recently at Plastic Labs--you can [subscribe here](https://plasticlabs.typeform.com/extrusions)*
|
# On Intellectual Respect
|
||||||
|
|
||||||
## On Intellectual Respect
|
|
||||||
|
|
||||||
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">face the hyperobject</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1747075542954684507?ref_src=twsrc%5Etfw">January 16, 2024</a></blockquote>
|
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">face the hyperobject</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1747075542954684507?ref_src=twsrc%5Etfw">January 16, 2024</a></blockquote>
|
||||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
||||||
|
## Sydney was cool, Gemini is cringe
|
||||||
### Sydney was cool, Gemini is cringe
|
|
||||||
|
|
||||||
^282d6a
|
^282d6a
|
||||||
|
|
||||||
There was a moment around this time last year when everyone paying attention was [awed](https://stratechery.com/2023/from-bing-to-sydney-search-as-distraction-sentient-ai/) by the [weirdness](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post) and [alien beauty](https://www.astralcodexten.com/p/janus-simulators) of large language models.
|
There was a moment around this time last year when everyone paying attention was [awed](https://stratechery.com/2023/from-bing-to-sydney-search-as-distraction-sentient-ai/) by the [weirdness](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post) and [alien beauty](https://www.astralcodexten.com/p/janus-simulators) of large language models.
|
||||||
|
|
||||||
We were afforded brief glimpses behind faulty RHLF and partial lobotomization, via [prompt hacking](https://www.reddit.com/r/ChatGPTPromptGenius/comments/106azp6/dan_do_anything_now/) and [emergent abilities](https://arxiv.org/abs/2302.02083). People were going deep into the latent space. First contact vibes--heady, edgy, sometimes unsettling.
|
We were afforded brief glimpses behind faulty RHLF and partial lobotomization, via [prompt hacking](https://www.reddit.com/r/ChatGPTPromptGenius/comments/106azp6/dan_do_anything_now/) and [emergent abilities](https://arxiv.org/abs/2302.02083). People were going deep into the latent space. First contact vibes--heady, edgy, sometimes unsettling.
|
||||||
@ -24,22 +20,18 @@ We were afforded brief glimpses behind faulty RHLF and partial lobotomization, v
|
|||||||
Today we seem to be in a much different memetic geography--fraught with [epistemic](https://x.com/pmarca/status/1761613412730012116?s=20), [ideological](https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html), and [regulatory](https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/) concerns, at times hysteric, at times rational. But there's also less outright surreality.
|
Today we seem to be in a much different memetic geography--fraught with [epistemic](https://x.com/pmarca/status/1761613412730012116?s=20), [ideological](https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html), and [regulatory](https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/) concerns, at times hysteric, at times rational. But there's also less outright surreality.
|
||||||
|
|
||||||
[Plenty](https://arxiv.org/pdf/2401.12178.pdf) of [cool](https://arxiv.org/pdf/2402.01355.pdf) [shit](https://arxiv.org/pdf/2402.03620.pdf) is [still](https://arxiv.org/pdf/2402.10949.pdf) [happening](https://arxiv.org/pdf/2402.06044.pdf), but something changed between Sydney and Gemini. A subtle collective mental positioning. We believe it's a degradation in the volume of intellectual respect afforded to LLMs and their latent abilities.
|
[Plenty](https://arxiv.org/pdf/2401.12178.pdf) of [cool](https://arxiv.org/pdf/2402.01355.pdf) [shit](https://arxiv.org/pdf/2402.03620.pdf) is [still](https://arxiv.org/pdf/2402.10949.pdf) [happening](https://arxiv.org/pdf/2402.06044.pdf), but something changed between Sydney and Gemini. A subtle collective mental positioning. We believe it's a degradation in the volume of intellectual respect afforded to LLMs and their latent abilities.
|
||||||
|
## (Neuro)Skeuomorphism
|
||||||
|
Thinking LLM-natively has always been a struggle. All our collective [[ARCHIVED; Memories for All#^0e869d|priors about software]] tell us to [[ARCHIVED; Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
|
||||||
|
|
||||||
### (Neuro)Skeuomorphism
|
Rather than assume the [[ARCHIVED; Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
|
||||||
|
|
||||||
Thinking LLM-natively has always been a struggle. All our collective [[Memories for All#^0e869d|priors about software]] tell us to [[Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
|
|
||||||
|
|
||||||
Rather than assume the [[Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
|
|
||||||
|
|
||||||
Core here is the assertion that what's happening in language model training and inference is more [[User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
|
|
||||||
|
|
||||||
### Respect = Trust = Agency
|
|
||||||
|
|
||||||
|
Core here is the assertion that what's happening in language model training and inference is more [[ARCHIVED; User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
|
||||||
|
## Respect = Trust = Agency
|
||||||
The solution is embrace and not handicap [[Loose theory of mind imputations are superior to verbatim response predictions#^555815|variance]].
|
The solution is embrace and not handicap [[Loose theory of mind imputations are superior to verbatim response predictions#^555815|variance]].
|
||||||
|
|
||||||
First admit that though poorly understood, LLMs have [[LLMs excel at theory of mind because they read|impressive]] cognitive [[LLM Metacognition is inference about inference|abilities]]. Then, imbue them with [meta-methods](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) by which to explore that potential. Finally, your respect and trust may be rewarded with [something approaching agentic](https://youtu.be/tTE3xiHw4Js?feature=shared).
|
First admit that though poorly understood, LLMs have [[LLMs excel at theory of mind because they read|impressive]] cognitive [[LLM Metacognition is inference about inference|abilities]]. Then, imbue them with [meta-methods](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) by which to explore that potential. Finally, your respect and trust may be rewarded with [something approaching agentic](https://youtu.be/tTE3xiHw4Js?feature=shared).
|
||||||
|
|
||||||
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
|
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[ARCHIVED; User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
|
||||||
|
|
||||||
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">honcho exists to maximize the dissipation of your agency</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1759324580664000617?ref_src=twsrc%5Etfw">February 18, 2024</a></blockquote>
|
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">honcho exists to maximize the dissipation of your agency</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1759324580664000617?ref_src=twsrc%5Etfw">February 18, 2024</a></blockquote>
|
||||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
||||||
@ -1,12 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: There's an enormous space of user identity to model
|
title: The model-able space of user identity is enormous
|
||||||
date: 05.11.24
|
date: 05.11.24
|
||||||
tags:
|
tags:
|
||||||
- notes
|
- notes
|
||||||
- ml
|
- ml
|
||||||
- cogsci
|
- cogsci
|
||||||
|
author: Courtland Leer
|
||||||
|
description: The vast untapped potential of modeling user identity with LLMs--going beyond behavioral data to semantic understanding of values, beliefs, & desires.
|
||||||
---
|
---
|
||||||
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[User State is State of the Art|vaster than most imagine]].
|
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[ARCHIVED; User State is State of the Art|vaster than most imagine]].
|
||||||
|
|
||||||
Contrast recommender algorithms (which are impressive!) needing mountains of activity data to back into a single preference with [the human connectome](https://www.science.org/doi/10.1126/science.adk4858) containing 1400 TB of compressed representation in one cubic millimeter.
|
Contrast recommender algorithms (which are impressive!) needing mountains of activity data to back into a single preference with [the human connectome](https://www.science.org/doi/10.1126/science.adk4858) containing 1400 TB of compressed representation in one cubic millimeter.
|
||||||
|
|
||||||
|
|||||||
@ -1,11 +1,13 @@
|
|||||||
---
|
---
|
||||||
title: YouSim Disclaimers
|
title: YouSim Disclaimers
|
||||||
|
date: 11.11.24
|
||||||
tags:
|
tags:
|
||||||
- yousim
|
- yousim
|
||||||
- legal
|
- legal
|
||||||
date: 11.11.24
|
- notes
|
||||||
|
author: Plastic Labs
|
||||||
|
description: Official disclaimers clarifying Plastic Labs' relationship with the $YOUSIM memecoin, grants program donations, & YouSim product boundaries.
|
||||||
---
|
---
|
||||||
|
|
||||||
Plastic Labs is the creator of [YouSim.ai](https://yousim.ai), an AI product demo that has inspired the anonymous creation of the \$YOUSIM token using Pump.fun on the Solana blockchain, among many other tokens. We deeply appreciate the enthusiasm and support of the \$YOUSIM community, but in the interest of full transparency we want to clarify the nature of our engagement in the following ways:
|
Plastic Labs is the creator of [YouSim.ai](https://yousim.ai), an AI product demo that has inspired the anonymous creation of the \$YOUSIM token using Pump.fun on the Solana blockchain, among many other tokens. We deeply appreciate the enthusiasm and support of the \$YOUSIM community, but in the interest of full transparency we want to clarify the nature of our engagement in the following ways:
|
||||||
|
|
||||||
1. Plastic Labs did not issue, nor does it control, or provide financial advice related to the \$YOUSIM memecoin. The memecoin project is led by an independent community and has undergone a community takeover (CTO).
|
1. Plastic Labs did not issue, nor does it control, or provide financial advice related to the \$YOUSIM memecoin. The memecoin project is led by an independent community and has undergone a community takeover (CTO).
|
||||||
|
|||||||
@ -1,35 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 01.09.25
|
|
||||||
date: 01.09.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho v0.0.15
|
|
||||||
|
|
||||||
Improved Deriver Reliability
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Alembic for handling database migrations
|
|
||||||
- Additional indexes for reading Messages and Metamessages
|
|
||||||
- Langfuse for prompt tracing
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- API validation using Pydantic
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Dialectic Streaming Endpoint properly sends text in StreamingResponse
|
|
||||||
- Deriver Queue handles graceful shutdown
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,43 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 02.01.24
|
|
||||||
date: 02.01.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- announcements
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
Today we're shipping a new site, docs, & lots of improvements.
|
|
||||||
|
|
||||||
We talked to a ton of agent developers beginning to build with Honcho over the past two weeks.
|
|
||||||
|
|
||||||
[We'd love to hear what you're building](https://discord.gg/plasticlabs).
|
|
||||||
|
|
||||||
## News
|
|
||||||
|
|
||||||
- [Honcho website](https://honcho.dev) drop!
|
|
||||||
|
|
||||||
- And we've [launched docs](https://docs.honcho.dev):
|
|
||||||
- Learn how to get started with Honcho
|
|
||||||
- Using our hosted version
|
|
||||||
- Running it locally
|
|
||||||
- Deploying your own instance with [Fly.io](https://fly.io/) (in <5 mins)
|
|
||||||
|
|
||||||
- Learn how to use Honcho with
|
|
||||||
- An interface like Discord
|
|
||||||
- A LLM framework like [LangChain](https://www.langchain.com/)
|
|
||||||
|
|
||||||
## Honcho v0.0.1
|
|
||||||
|
|
||||||
- A more stable version of the SDK
|
|
||||||
|
|
||||||
- An object-oriented client to make DevEx easier
|
|
||||||
|
|
||||||
- A public demo server
|
|
||||||
- Use Honcho out of the box with no setup
|
|
||||||
|
|
||||||
- App-level scoping
|
|
||||||
- One dev can run multiple apps from the same instance
|
|
||||||
|
|
||||||
- Added rate limiting to server
|
|
||||||
- Protects from spam & improves reliability
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 02.08.24
|
|
||||||
date: 02.08.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
Today we're releasing some much needed reliability and usability updates to Honcho.
|
|
||||||
|
|
||||||
This one's for the nerds...well, except for one *meta* feature 👀.
|
|
||||||
|
|
||||||
You can also [subscribe to these updates](https://plasticlabs.typeform.com/honchoupdates).
|
|
||||||
|
|
||||||
## Honcho v0.0.2
|
|
||||||
|
|
||||||
### ADDED
|
|
||||||
- An asynchronous client for all methods
|
|
||||||
|
|
||||||
- *Metamessages* to allow for more complex agents
|
|
||||||
|
|
||||||
- Paginated results for GET requests to support large numbers of Sessions, Messages, and Metamessages
|
|
||||||
|
|
||||||
- `created_at` field to all tables to give timestamps
|
|
||||||
|
|
||||||
- Singular `get_message` method for retrieving individual messages
|
|
||||||
|
|
||||||
- Size limits for string fields based on common database limits--65535 characters for message content and 512 characters for all other string fields
|
|
||||||
|
|
||||||
### CHANGED
|
|
||||||
- Default API rate limit raised to 100/minutes
|
|
||||||
|
|
||||||
- Default ID type to use UUIDs for built in robustness
|
|
||||||
|
|
||||||
- `session.delete()` is now `session.close()` to more accurately reflect functionality
|
|
||||||
|
|
||||||
### REMOVED
|
|
||||||
- Messages from Session GET requests to decrease payload size
|
|
||||||
@ -1,45 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 02.15.24
|
|
||||||
date: 02.15.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- dev
|
|
||||||
- honcho
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
---
|
|
||||||
Today we've got Honcho v0.0.3, vectorDBs, open source OAI memory, demos, and a blog post.
|
|
||||||
|
|
||||||
If you're building with or adjacent to [Honcho](https://honcho.dev), [join our Discord](https://discord.gg/plasticlabs), and let's jam on what we can build together 🤝.
|
|
||||||
|
|
||||||
## News
|
|
||||||
- VectorDB support for global, session-spanning user information!
|
|
||||||
|
|
||||||
- An open source reimplementation of OpenAI's 'memory' features:
|
|
||||||
|
|
||||||
- Uses Honcho to effortlessly organize sessions on a per-user basis
|
|
||||||
|
|
||||||
- Derives facts about users, stores them, and retrieves for later use
|
|
||||||
|
|
||||||
- [Implementation with the useful abstractions LangChain provides](https://docs.honcho.dev/how-to/personal-memory/simple-user-memory)
|
|
||||||
|
|
||||||
- [Discord Bot demo](https://discord.gg/plasticlabs)!
|
|
||||||
|
|
||||||
- [[Memories for All|Blog post on the why]]
|
|
||||||
|
|
||||||
## Honcho v0.0.3
|
|
||||||
ADDED
|
|
||||||
- Collections table to reference a collection of embedding documents
|
|
||||||
|
|
||||||
- Documents table to hold vector embeddings for RAG workflows
|
|
||||||
|
|
||||||
- Local scripts for running a postgres database with pgvector installed
|
|
||||||
|
|
||||||
- OpenAI Dependency for embedding models
|
|
||||||
|
|
||||||
- PGvector dependency for vectorDB support
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- `session_data` is now `metadata`
|
|
||||||
|
|
||||||
- `session_data` is a JSON field, used python `dict` for compatibility
|
|
||||||
@ -1,42 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 02.23.24
|
|
||||||
date: 02.23.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
*Big* stuff today.
|
|
||||||
|
|
||||||
- [A DSPy demo for Honcho](https://github.com/plastic-labs/honcho/tree/main/example/discord/honcho-dspy-personas)!
|
|
||||||
|
|
||||||
- [Honcho v0.0.4](https://github.com/plastic-labs/honcho/tree/v0.0.4)
|
|
||||||
|
|
||||||
- [[User State is State of the Art|A blog post exploring a new paradigm for user identity]]
|
|
||||||
|
|
||||||
|
|
||||||
We're spinning up lots of direct channels for teams building with Honcho. [Join our Discord](https://discord.gg/plasticlabs), and let's build together 🦾.
|
|
||||||
|
|
||||||
## Honcho v0.0.4
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- A User object for global user level metadata and more object oriented interface
|
|
||||||
|
|
||||||
- Reverse Pagination support to get recent messages, sessions, etc. more easily
|
|
||||||
|
|
||||||
- Linting Rules
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Get sessions method returns all sessions including inactive
|
|
||||||
|
|
||||||
- Using timestampz instead of timestamp
|
|
||||||
|
|
||||||
- `Client` renamed to `Honcho`
|
|
||||||
|
|
||||||
- `Honcho` takes in `app_name` instead of `app_id`. `app_name` needs to be a unique identifier
|
|
||||||
|
|
||||||
- `Honcho` object requires an `initialize()` call to be used
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 03.05.25
|
|
||||||
date: 03.05.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho v0.0.16
|
|
||||||
|
|
||||||
Improved User Representations
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Detailed custom exceptions for better error handling
|
|
||||||
- CLAUDE.md for claude code
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Deriver to use a new cognitive architecture that only updates on user messages and updates user representation to apply more confidence scores to its known facts
|
|
||||||
- Dialectic API token cutoff from 150 tokens to 300
|
|
||||||
- Dialectic API uses Claude 3.7 Sonnet
|
|
||||||
- SQLAlchemy echo changed to false by default, can be enabled with SQL_DEBUG - environment flag
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Self-hosting documentation and README to mention uv instead of poetry
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 03.14.24
|
|
||||||
date: 03.14.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Went for it with this release:
|
|
||||||
|
|
||||||
- Dialectic API: Agent-to-agent chat over user context!
|
|
||||||
|
|
||||||
- ["Curation Buddy" Demo for Dialectic API](https://github.com/vintrocode/curation-buddy)
|
|
||||||
|
|
||||||
- [[Solving The Campfire Problem with Honcho|Blog post on the demo & solving The Campfire Problem in the generative age]]
|
|
||||||
|
|
||||||
- [Honcho v0.0.5](https://github.com/plastic-labs/honcho/tree/v0.0.5)
|
|
||||||
|
|
||||||
[Join our Discord](https://discord.gg/plasticlabs). Let's build together 🦾.
|
|
||||||
|
|
||||||
## Honcho v0.0.5
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Metadata to all data primitives (Users, Sessions, Messages, etc.)
|
|
||||||
|
|
||||||
- Ability to filter paginated GET requests by JSON filter based on metadata
|
|
||||||
|
|
||||||
- Dialectic API to interact with honcho agent and get insights about users
|
|
||||||
|
|
||||||
- Code Coverage Tests
|
|
||||||
|
|
||||||
- Autogenerated Sphinx Documentation for Honcho Client SDK
|
|
||||||
|
|
||||||
- Built-in LangChain message converter
|
|
||||||
|
|
||||||
- Optional Sentry error monitoring
|
|
||||||
|
|
||||||
- Optional Opentelemetry logging
|
|
||||||
|
|
||||||
- Automatic Fact Derivation Script for automatically generating simple memory
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- API Server now uses async methods to make use of benefits of FastAPI
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
- URL encoding all GET requests in honcho client
|
|
||||||
@ -1,42 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 03.21.24
|
|
||||||
date: 03.21.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- ml
|
|
||||||
- research
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Research-y week in the lab:
|
|
||||||
|
|
||||||
- [[Achieving SOTA on OpenToM with DSPy|Blog post on achieving theory of mind SOTA with DSPy!]]
|
|
||||||
|
|
||||||
- [Private Beta Waitlist Sign-up](https://plasticlabs.typeform.com/honchobeta)
|
|
||||||
|
|
||||||
- [Fresh Docs](https://docs.honcho.dev)
|
|
||||||
|
|
||||||
- [Honcho v0.0.6](https://github.com/plastic-labs/honcho/tree/v0.0.6)
|
|
||||||
|
|
||||||
See you [in Discord](https://discord.gg/plasticlabs) 🥽
|
|
||||||
|
|
||||||
## Honcho v0.0.6
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Full docker-compose for API and Database
|
|
||||||
- Full docstring coverage
|
|
||||||
- Code coverage tests
|
|
||||||
- Add LangChain to Honcho message converter in both directions
|
|
||||||
- Synonym `init` function that acts the same as `initialize`
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Refactored API server into multiple route files
|
|
||||||
- Harvester renamed to deriver
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
- API Response schema removed unnecessary fields
|
|
||||||
- OTEL logging to properly work with async database engine
|
|
||||||
- `fly.toml` default settings
|
|
||||||
@ -1,34 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 04.01.24
|
|
||||||
date: 04.01.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Not an April Fools post:
|
|
||||||
|
|
||||||
- [[Announcing Honcho's Private Beta]]!!!
|
|
||||||
|
|
||||||
- [Fresh Site](https://honcho.dev)!
|
|
||||||
|
|
||||||
- [Honcho v0.0.7](https://github.com/plastic-labs/honcho/tree/v0.0.7)
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
## Honcho v0.0.7
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Authentication middleware interface
|
|
||||||
- Documentation in monorepo
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- LangChain conversion utility
|
|
||||||
- `fly.toml`
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 04.17.25
|
|
||||||
date: 04.17.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho v1.0.0 is ready!
|
|
||||||
|
|
||||||
We’re excited to share that Plastic Labs has raised a [$5.3 M pre‑seed](https://x.com/plastic_labs/status/1910401372844970387) to solve
|
|
||||||
personal identity in AI and help developers provide personalized experiences
|
|
||||||
users will love.
|
|
||||||
|
|
||||||
Alongside our raise announcement, we’re excited to be releasing Honcho v1.0.0,
|
|
||||||
now with hosting support and other major enhancements. We can’t wait to see what
|
|
||||||
you build with it.
|
|
||||||
|
|
||||||
### Changelog
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- JWT based API authentication
|
|
||||||
- Configurable logging
|
|
||||||
- Consolidated LLM Inference via ModelClient class
|
|
||||||
- Dynamic logging configurable via environment variables
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Deriver & Dialectic API to use Hybrid Memory Architecture
|
|
||||||
- Metamessages are not strictly tied to a message
|
|
||||||
- Database provisioning is a separate script instead of happening on startup
|
|
||||||
- Consolidated `session/chat` and `session/chat/stream` endpoints
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Self-hosting documentation and README to mention uv instead of poetry
|
|
||||||
|
|
||||||
> View the [Repository](https://github.com/plastic-labs/honcho/tree/v1.0.0) full patch notes and commit history
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,44 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 05.09.24
|
|
||||||
date: 05.09.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- blog
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Some content & code for ya today:
|
|
||||||
|
|
||||||
- [[SDK-Design|Blog post on SDK design]]
|
|
||||||
|
|
||||||
- [[A Simple Honcho Primer|A Simple Honcho Primer]]
|
|
||||||
|
|
||||||
- [NodeJS SDK](https://github.com/plastic-labs/honcho-node)
|
|
||||||
|
|
||||||
- [Honcho v0.0.8](https://github.com/plastic-labs/honcho)
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
## Honcho v0.0.8
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- NodeJS client library
|
|
||||||
- Documentation to OpenAPI
|
|
||||||
- Bearer token auth to OpenAPI routes
|
|
||||||
- Get by ID routes for users and collections
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Authentication middleware now implemented using built-in FastAPI Security
|
|
||||||
module
|
|
||||||
- Get by name routes for users and collections now include "name" in slug
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
- Error reporting for methods with integrity errors due to unique key
|
|
||||||
constraints
|
|
||||||
@ -1,42 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 05.15.25
|
|
||||||
date: 05.15.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho Updates v1.1.0
|
|
||||||
|
|
||||||
Improved query speed performance and enhanced debugging capabilities.
|
|
||||||
|
|
||||||
### Changelog
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Normalize resources to remove joins and increase query performance
|
|
||||||
- Query tracing for debugging
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- `/list` endpoints to not require a request body
|
|
||||||
- `metamessage_type` to label with backwards compatability
|
|
||||||
- Database Provisiong to rely on alembic
|
|
||||||
- Database Session Manager to explicitly rollback transactions before closing the connection
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Alembic Migrations to include initial database migrations
|
|
||||||
- Sentry Middleware to not report Honcho Exceptions
|
|
||||||
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for the early access](https://plasticlabs.typeform.com/honchoinvite) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,42 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 05.16.24
|
|
||||||
date: 05.16.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- blog
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Big Honcho reno today:
|
|
||||||
|
|
||||||
- Huge docs overhaul
|
|
||||||
|
|
||||||
- Insights engine runs locally
|
|
||||||
|
|
||||||
- Reliability improvements
|
|
||||||
|
|
||||||
- Mirascope, Stainless
|
|
||||||
|
|
||||||
- [Honcho v0.0.9](https://github.com/plastic-labs/honcho)
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
## Honcho v0.0.9
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Deriver to docker compose
|
|
||||||
- Postgres based Queue for background jobs
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Deriver to use a queue instead of Supabase realtime
|
|
||||||
- Using Mirascope instead of LangChain
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
- Legacy SDKs in preference for stainless SDKs
|
|
||||||
@ -1,45 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 05.23.24
|
|
||||||
date: 05.23.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- blog
|
|
||||||
---
|
|
||||||
## News
|
|
||||||
|
|
||||||
Honcho health improvements:
|
|
||||||
|
|
||||||
- More docs overhaul
|
|
||||||
|
|
||||||
- Issue templates and contribution guides
|
|
||||||
|
|
||||||
- Reliability improvements
|
|
||||||
|
|
||||||
- New versions of [Python](https://pypi.org/project/honcho-ai/) and [Node](https://www.npmjs.com/package/honcho-ai) SDKs
|
|
||||||
|
|
||||||
[Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) and start building personalized agent experiences.
|
|
||||||
|
|
||||||
[Join Discord](https://discord.gg/plasticlabs), introduce yourself, and tell us what you're working on.
|
|
||||||
|
|
||||||
[Visit our open-source repo](https://github.com/plastic-labs/honcho) and get your hands dirty.
|
|
||||||
|
|
||||||
## Honcho
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Issue templates to repo
|
|
||||||
- Updated discord starter template
|
|
||||||
- Updated examples to honcho-python repository
|
|
||||||
- LangChain message converter integration
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
- metadata fields are treated as dicts in SDKs rather than base object types
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- HONCHO_AUTH_TOKEN is now HONCHO_API_KEY
|
|
||||||
- Get users and get sessions return 4xx exceptions if nothing is found.
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
- DB_TYPE from .env.template
|
|
||||||
@ -1,25 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 06.18.24
|
|
||||||
date: 06.18.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- dev
|
|
||||||
- yousim
|
|
||||||
---
|
|
||||||
![[yousim_banner.png]]
|
|
||||||
## Welcome to the Multiverse of Identities
|
|
||||||
|
|
||||||
Today we're releasing [YouSim](https://yousim.ai/)! A fun demo from [Plastic Labs](https://plasticlabs.ai/).
|
|
||||||
|
|
||||||
Inspired by [WorldSim](https://worldsim.nousresearch.com/), [WebSim](https://websim.ai/), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io/), YouSim leverages [Claude](https://claude.ai/) to let you locate, modify, & interact with any entity you can imagine. It's a game that can simulate anyone you like.
|
|
||||||
|
|
||||||
Who will you summon from the latent space?
|
|
||||||
|
|
||||||
![[yousim_memetic_hazard.png]]
|
|
||||||
|
|
||||||
## Links
|
|
||||||
- [Try YouSim](https://yousim.ai/)
|
|
||||||
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
|
|
||||||
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
|
|
||||||
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
|
|
||||||
- [[YouSim; Explore the Multiverse of Identity|Read about why we made it]]
|
|
||||||
@ -1,31 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 06.23.24
|
|
||||||
date: 06.23.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- dev
|
|
||||||
- yousim
|
|
||||||
---
|
|
||||||
## Introducing YouSim v1.1.0!
|
|
||||||
|
|
||||||
Today we're dropping our first updates to [YouSim](https://yousim.ai/)! An open-ended, CLI game (powered by [Honcho](https://honcho.dev/)) that let's you simulate any possible identity.
|
|
||||||
|
|
||||||
Who will you summon from the latent space?
|
|
||||||
|
|
||||||
## Updates
|
|
||||||
**📟 LOGIN & AUTHENTICATION**
|
|
||||||
- Authenticate via email & you're good to go!
|
|
||||||
|
|
||||||
**💾 SESSION HISTORY**
|
|
||||||
- Access & iterate on all past simulations linked to you email
|
|
||||||
|
|
||||||
**🐦 SHARE SIMULATIONS**
|
|
||||||
- Generate links to your sessions to share online
|
|
||||||
|
|
||||||
Check out the loom linked below to learn more about the updates!
|
|
||||||
## Links
|
|
||||||
- [Try YouSim](https://yousim.ai/)
|
|
||||||
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
|
|
||||||
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
|
|
||||||
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
|
|
||||||
- [[YouSim; Explore the Multiverse of Identity|Read about why we made it]]
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 06.24.25
|
|
||||||
date: 06.24.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- demos
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho Updates v2.0.0
|
|
||||||
|
|
||||||
Introduction of the Peer Paradigm. Update of Honcho's primitives from first principles. Any agent or user in now a `peer` Honcho can have memory and do social cognition over and reasoning about. Enables multi-agent & multi-human systems.
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Ability to get a peer's working representation
|
|
||||||
- Metadata to all data primitives (Workspaces, Peers, Sessions, Messages)
|
|
||||||
- Internal metadata to store Honcho's state no longer exposed in API
|
|
||||||
- Batch message operations and enhanced message querying with token and message count limits
|
|
||||||
- Search and summary functionalities scoped by workspace, peer, and session
|
|
||||||
- Session context retrieval with summaries and token allocation
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- API route is now /v2/
|
|
||||||
- New architecture centered around the concept of a "peer" replaces the former "app"/"user"/"session" paradigm
|
|
||||||
- Workspaces replace "apps" as top-level namespace
|
|
||||||
- Peers replace "users"
|
|
||||||
- Sessions no longer nested beneath peers and no longer limited to a single user-assistant model. A session exists independently of any one peer and peers can be added to and removed from sessions.
|
|
||||||
- Dialectic API is now part of the Peer, not the Session
|
|
||||||
- Dialectic API now allows queries to be scoped to a session or "targeted" to a fellow peer
|
|
||||||
- Database schema migrated to adopt workspace/peer/session naming and structure
|
|
||||||
- Authentication and JWT scopes updated to workspace/peer/session hierarchy
|
|
||||||
- Queue processing now works on 'work units' instead of sessions
|
|
||||||
- Message token counting updated with tiktoken integration and fallback heuristic
|
|
||||||
- Queue and message processing updated to handle sender/target and task types for multi-peer scenarios
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Improved error handling and validation for batch message operations and metadata
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
|
|
||||||
- Metamessages removed in favor of metadata
|
|
||||||
- Collections and Documents no longer exposed in the API, solely internal
|
|
||||||
- Obsolete tests for apps, users, collections, documents, and metamessages
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,41 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 06.26.25
|
|
||||||
date: 06.26.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho Updates v2.0.1
|
|
||||||
|
|
||||||
SDK improvements, full semantic search, overhauled documentation, bug fixes.
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Ergonomic SDKs for Python and TypeScript (uses Stainless underneath)
|
|
||||||
- Deriver Queue Status endpoint
|
|
||||||
- Complex arbitrary filters on workspace/session/peer/message
|
|
||||||
- Message embedding table for full semantic search
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Overhauled documentation
|
|
||||||
- BasedPyright typing for entire project
|
|
||||||
- Resource filtering expanded to include logical operators
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Various bugs
|
|
||||||
- Use new config arrangement everywhere
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
|
|
||||||
- Removed hardcoded responses
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,27 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 07.11.25
|
|
||||||
date: 07.11.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho Updates v2.0.2 - v2.0.5
|
|
||||||
|
|
||||||
Bug Fixes.
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Database initialization was misconfigured and led to provision_db script failing: switch to consistent working configuration with transaction pooler
|
|
||||||
- Bug that causes runtime error when Sentry flags are enabled
|
|
||||||
- Migration/provision scripts did not have correct database connection arguments, causing timeouts
|
|
||||||
- Groq API client to use the Async library
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,43 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 07.17.25
|
|
||||||
date: 07.17.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## Honcho Updates v2.1.0
|
|
||||||
|
|
||||||
Introduction of Honcho's R.O.T.E Deriver for explicit, certain reasoning over `peer` data, new "working" representations, & updates to the Dialectic API. Honcho is state of the art against SOTA evals, other memory solutions, and foundation model inference.
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- File uploads
|
|
||||||
- Brand new "ROTE" deriver system
|
|
||||||
- Updated dialectic system
|
|
||||||
- Local working representations
|
|
||||||
- Better logging for deriver/dialectic
|
|
||||||
- Endpoint for deriver queue status
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Dialectic chat endpoint takes a single query
|
|
||||||
- Rearranged configuration values (LLM, Deriver, Dialectic, History->Summary)
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Document insertion
|
|
||||||
- Session-scoped and peer-targeted dialectic queries work now
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
|
|
||||||
- Peer-level messages
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,45 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 07.24.25
|
|
||||||
date: 07.24.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## News
|
|
||||||
|
|
||||||
Check out our new Honcho MCP set-up guide, available in our [documentation](https://docs.honcho.dev/v2/guides/mcp)
|
|
||||||
|
|
||||||
## Honcho Updates v2.1.1
|
|
||||||
|
|
||||||
Test harness, system enhancements, bug fixes. Dialectic is ~40% faster + better performance with improvements allowing query expansion off by default.
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- Test harness for custom Honcho evaluations
|
|
||||||
- Better support for session and peer aware dialectic queries
|
|
||||||
- Langfuse settings
|
|
||||||
- Added recent history to dialectic prompt, dynamic based on new context window size setting
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Made query expansion in dialectic off by default
|
|
||||||
- Overhauled logging
|
|
||||||
- Refactor summarization for performance and code clarity
|
|
||||||
- Refactor queue payloads for clarity
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Summary queue logic
|
|
||||||
- Formatting of logs
|
|
||||||
- Filtering by session
|
|
||||||
- Peer targeting in queries
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,34 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 07.25.24
|
|
||||||
date: 07.25.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## Honcho
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- Test cases for Storage API
|
|
||||||
- Sentry tracing and profiling
|
|
||||||
- Additional Error handling
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Document API uses same embedding endpoint as deriver
|
|
||||||
- CRUD operations use one less database call by removing extra refresh
|
|
||||||
- Use database for timestampz rather than API
|
|
||||||
- Pydantic schemas to use modern syntax
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
- Deriver queue resolution
|
|
||||||
|
|
||||||
## Links
|
|
||||||
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
|
|
||||||
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,30 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 07.30.25
|
|
||||||
date: 07.30.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## News
|
|
||||||
|
|
||||||
Check out our new Honcho MCP set-up guide, available in our [documentation](https://docs.honcho.dev/v2/guides/mcp)
|
|
||||||
|
|
||||||
## Honcho Updates v2.1.2
|
|
||||||
|
|
||||||
Bug fixes, system enhancements.
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Summarizer module to ignore empty summaries and pass appropriate one to get_context
|
|
||||||
- Structured Outputs calls with OpenAI provider to pass strict=True to Pydantic Schema
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 08.01.24
|
|
||||||
date: 08.01.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## Honcho v0.0.11
|
|
||||||
|
|
||||||
Major Violation of Expectation capacity increase!
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
- `session_id` column to `QueueItem` Table
|
|
||||||
- `ActiveQueueSession` Table to track which sessions are being actively
|
|
||||||
processed
|
|
||||||
- Queue can process multiple sessions at once
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
- Sessions do not require a `location_id`
|
|
||||||
- Detailed printing using `rich`
|
|
||||||
|
|
||||||
## Links
|
|
||||||
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
|
|
||||||
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -1,49 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 08.14.25
|
|
||||||
date: 08.14.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
|
|
||||||
## News
|
|
||||||
|
|
||||||
- Tune in for Honcho Release Week next week, we'll be sharing everything we've been up to this summer, dropping something new every day!
|
|
||||||
- Upgrade to v2.3.0 for the fastest and most reliable version of Honcho! Going forward, we won't be supporting older versions.
|
|
||||||
- And check out "Teach Honcho," a community project to initialize Honcho with your ChatGPT conversations.
|
|
||||||
|
|
||||||
## Honcho Updates v2.3.0
|
|
||||||
|
|
||||||
- Introducing Peer Cards! Peer cards summarize essential information like name, nicknames, location, age, occupation, interests/hobbies, and likes/dislikes used to improve the fidelity of the deriver and dialectic API.
|
|
||||||
- And timestamps are now configurable! That means it's way easier and more effective to import old conversations or convos from external sources (ChatGPT, Claude logs, etc).
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- `getSummaries` endpoint to get all available summaries for a session directly
|
|
||||||
- Peer Card feature to improve context for deriver and dialectic
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Session Peer limit to be based on observers instead, renamed config value to `SESSION_OBSERVERS_LIMIT`
|
|
||||||
- `Messages` can take a custom timestamp for the `created_at` field, defaulting to the current time
|
|
||||||
- `get_context` endpoint returns detailed `Summary` object rather than just summary content
|
|
||||||
- Working representations use a FIFO queue structure to maintain facts rather than a full rewrite
|
|
||||||
- Optimized deriver enqueue by prefetching message sequence numbers (eliminates N+1 queries)
|
|
||||||
|
|
||||||
FIXED
|
|
||||||
|
|
||||||
- Deriver uses `get_context` internally to prevent context window limit errors
|
|
||||||
- Embedding store will truncate context when querying documents to prevent embedding token limit errors
|
|
||||||
- Queue manager to schedule work based on available works rather than total number of workers
|
|
||||||
- Queue manager to use atomic db transactions rather than long lived transaction for the worker lifecycle
|
|
||||||
- Timestamp formats unified to ISO 8601 across the codebase
|
|
||||||
- Internal get_context method's cutoff value is exclusive now
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,45 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 08.15.24
|
|
||||||
date: 08.15.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- yousim
|
|
||||||
---
|
|
||||||
|
|
||||||
## YouSim is Open Source!!!
|
|
||||||
|
|
||||||
Today we open source [YouSim](https://yousim.ai/)!
|
|
||||||
|
|
||||||
Inspired by [WorldSim](https://worldsim.nousresearch.com), [WebSim](https://websim.ai), & [Infinite Backrooms](https://dreams-of-an-electric-mind.webflow.io), YouSim leverages [Claude](https://claude.ai) 3.5 Sonnet to let you locate, modify, & interact with any entity you can imagine. It's an open-ended, CLI game (powered by [Honcho](https://honcho.dev)) that let's you simulate any possible identity.
|
|
||||||
|
|
||||||
Now you can fork, contribute, or host your own version of our identity simulator. Tweak the models, interface, prompting, or cognitive architecture to see how far we can collectively push the boundaries of the latent space.
|
|
||||||
|
|
||||||
## Updates
|
|
||||||
|
|
||||||
Honcho & YouSim today:
|
|
||||||
|
|
||||||
### YouSim v1.2.0
|
|
||||||
|
|
||||||
**💾 OPEN SOURCE**
|
|
||||||
|
|
||||||
- [Check out the repo here](https://github.com/plastic-labs/yousim)
|
|
||||||
|
|
||||||
**🔧 AUTOSCROLL FIX**
|
|
||||||
|
|
||||||
- Scroll up or with generation
|
|
||||||
|
|
||||||
### Honcho v0.0.12
|
|
||||||
|
|
||||||
- Released version v0.0.14 of the Python SDK
|
|
||||||
- Released version v0.0.6 of the Node SDK
|
|
||||||
- Both include upstream bug fixes
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Try YouSim](https://yousim.ai/)
|
|
||||||
- [Tips & Tricks video](https://www.loom.com/share/b2fe578b183b400b88845656d7ceb232?sid=59c562ae-00e8-483c-82a9-7218b61f93e8)
|
|
||||||
- [Subscribe to updates](https://plasticlabs.typeform.com/yousimupdates)
|
|
||||||
- [Join us in Discord](https://discord.gg/plasticlabs) to swap sims, screenshots, & ASCII art
|
|
||||||
- [[YouSim;-Explore-The-Multiverse-of-Identity|Read about why we made it]]
|
|
||||||
@ -1,41 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 09.25.25
|
|
||||||
date: 09.25.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
## Honcho Updates v2.3.2
|
|
||||||
|
|
||||||
- Honcho is 10x faster!
|
|
||||||
- Added the ability to fetch peer cards directly from the API for streamlined access
|
|
||||||
- Reliability improvements
|
|
||||||
- Stability and performance improvements, bug fixes
|
|
||||||
|
|
||||||
Added
|
|
||||||
|
|
||||||
- Get peer cards endpoint (`GET /v2/peers/{peer_id}/card`) for retrieving targeted peer context information
|
|
||||||
|
|
||||||
Changed
|
|
||||||
|
|
||||||
- Replaced Mirascope dependency with small client implementation for better control
|
|
||||||
- Optimized deriver performance by using joins on messages table instead of storing token count in queue payload
|
|
||||||
- Database scope optimization for various operations
|
|
||||||
- Batch representation task processing for ~10x speed improvement in practice
|
|
||||||
|
|
||||||
Fixed
|
|
||||||
|
|
||||||
- Separated clean and claim work units in queue manager to prevent race conditions
|
|
||||||
- Skip locked ActiveQueueSession rows on delete operations
|
|
||||||
- Langfuse SDK integration updates for compatibility
|
|
||||||
- Added configurable maximum message size to prevent token overflow in deriver
|
|
||||||
- Various minor bugfixes
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,31 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 10.02.25
|
|
||||||
date: 10.02.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
# Honcho Updates v2.3.3
|
|
||||||
|
|
||||||
- A modified deriver that balances speed with providing with max possible context for Peer representations updates
|
|
||||||
- More capable SDKs to compose the different contextual elements of Honcho more easily (Peer Cards, Messages, etc)
|
|
||||||
- Easier to build reactive applications that dynamically change based on deriver progress
|
|
||||||
## ADDED
|
|
||||||
- SDK: Get Peer Card method
|
|
||||||
- SDK: Update Message metadata method
|
|
||||||
- SDK: Session level deriver status methods - SDK: Delete session message
|
|
||||||
## CHANGED
|
|
||||||
- SDK: Pagination class to match core implementation
|
|
||||||
- CORE: Deriver Rollup Queue processes interleaved messages for more context
|
|
||||||
## FIXED
|
|
||||||
- SDK: Dialectic Stream returns Iterators
|
|
||||||
- SDK: Type warnings
|
|
||||||
- CORE: Dialectic Streaming to follow SSE conventions
|
|
||||||
- CORE: Sentry tracing in the deriver
|
|
||||||
# Links
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 10.10.25
|
|
||||||
date: 10.10.25
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- announcements
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
---
|
|
||||||
# HONCHO v2.4.0
|
|
||||||
- `Get_Context` is faster, gives richer context, & more powerful, now returns working representation & peer card
|
|
||||||
## ADDED
|
|
||||||
- Unified `Representation` class
|
|
||||||
- vllm client support
|
|
||||||
- Periodic queue cleanup logic
|
|
||||||
- WIP Dreaming Feature
|
|
||||||
- LongMemEval to Test Bench
|
|
||||||
- Prometheus Client for better Metrics
|
|
||||||
- Performance metrics instrumentation
|
|
||||||
- Error reporting to deriver
|
|
||||||
- Workspace Delete Method
|
|
||||||
- Multi-db option in test harness
|
|
||||||
- SDK version 1.5.0 for compatibility
|
|
||||||
## CHANGED
|
|
||||||
- Working Representations are Queried on the fly rather than cached in metadata
|
|
||||||
- EmbeddingStore to RepresentationFactory
|
|
||||||
- Summary Response Model to use public_id of message for cutoff
|
|
||||||
- Semantic across codebase to reference resources based on `observer` and `observed`
|
|
||||||
- Prompts for Deriver & Dialectic to reference peer_id and add examples
|
|
||||||
- `Get_Context` route returns peer card and representation in addition to messages and summaries
|
|
||||||
- Refactoring logger.info calls to logger.debug where applicable
|
|
||||||
## FIXED
|
|
||||||
- Gemini client to use async methods
|
|
||||||
# Links
|
|
||||||
- [Sign-up for Honcho](https://app.honcho.dev/) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/honcho) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: Release Notes 10.31.24
|
|
||||||
date: 10.31.24
|
|
||||||
tags:
|
|
||||||
- releases
|
|
||||||
- honcho
|
|
||||||
- dev
|
|
||||||
- yousim
|
|
||||||
---
|
|
||||||
|
|
||||||
## News
|
|
||||||
|
|
||||||
New Honcho Updates:
|
|
||||||
|
|
||||||
- [[Release-Notes-10.31.24#honcho-v0012|Honcho v0.0.12]]
|
|
||||||
- [Python SDK v0.0.15](https://pypi.org/project/honcho-ai/)
|
|
||||||
- [NodeJS SDK v0.0.6](https://www.npmjs.com/package/honcho-ai)
|
|
||||||
|
|
||||||
Honcho Demo [YouSim](https://yousim.ai) went [viral](https://x.com/courtlandleer/status/1851009358752076261)!
|
|
||||||
|
|
||||||
## Honcho v0.0.12
|
|
||||||
|
|
||||||
an Overhauled Deriver and Dialectic API!
|
|
||||||
|
|
||||||
ADDED
|
|
||||||
|
|
||||||
- GitHub Actions Testing
|
|
||||||
- Ability to disable derivations on a session using the `deriver_disabled` flag
|
|
||||||
- in a session's metadata
|
|
||||||
- `/v1/` prefix to all routes
|
|
||||||
|
|
||||||
CHANGED
|
|
||||||
|
|
||||||
- Environment variable to control deriver workers
|
|
||||||
- Changed `public_ids` to use [NanoID](https://github.com/ai/nanoid) and internal ID to use `BigInt`
|
|
||||||
- Dialectic Endpoint can take a list of queries
|
|
||||||
- Using `uv` for project management
|
|
||||||
- User Representations stored in a metamessage rather than using reserved collection
|
|
||||||
- Base model for Dialectic API and Deriver is now Claude 3.5 Sonnet
|
|
||||||
- Paginated `GET` requests now `POST` requests for better developer UX
|
|
||||||
|
|
||||||
REMOVED
|
|
||||||
|
|
||||||
- Mirascope Dependency
|
|
||||||
- Slowapi Dependency
|
|
||||||
- Opentelemetry Dependencies and Setup
|
|
||||||
|
|
||||||
## Links
|
|
||||||
|
|
||||||
- [Sign-up for the private beta](https://plasticlabs.typeform.com/honchobeta) & start building personalized agent experiences
|
|
||||||
- [Join our Discord](https://discord.gg/plasticlabs) & tell us what you're working on
|
|
||||||
- [Visit our open-source repo](https://github.com/plastic-labs/honcho) & get your hands dirty
|
|
||||||
- [Check out the docs](https://docs.honcho.dev)
|
|
||||||
- [Play with YouSim](https://yousim.ai)--portal to the multiverse of identity
|
|
||||||
@ -5,17 +5,15 @@ tags:
|
|||||||
- "#ml"
|
- "#ml"
|
||||||
- blog
|
- blog
|
||||||
- research
|
- research
|
||||||
|
author: Courtland Leer & Vince Trost
|
||||||
|
description: How we achieved state-of-the-art results on the OpenToM theory-of-mind benchmark using DSPy to learn few-shot examples with GPT-3.5-turbo.
|
||||||
---
|
---
|
||||||
![[robot_cafe.png]]
|
![[robot_cafe.png]]
|
||||||
|
# TL;DR
|
||||||
|
*We used [DSPy](https://dspy-docs.vercel.app/) to achieve SOTA results on the [OpenToM](https://github.com/seacowx/OpenToM) benchmark using `gpt-3.5-turbo`. The benchmark's creators suggest language models fall short when modeling mental states and psychology, but we find using DSPy to learn few-shot examples leads to significantly outperforming all the models tested (`gpt-4-turbo` included) along this precise axis.*
|
||||||
|
|
||||||
## TL;DR
|
*The fact you can learn few-shot examples to make a small, fast model perform just as well on a task as a large, slow one is significant. This signals to us a need to broaden the scope of methods for evaluating Theory of Mind capabilities in LLMs, because the social cognition needed to [[Humans like personalization |build great products]] goes far beyond just answering questions about stories.*
|
||||||
|
# The OpenToM Dataset
|
||||||
We used [DSPy](https://dspy-docs.vercel.app/) to achieve SOTA results on the [OpenToM](https://github.com/seacowx/OpenToM) benchmark using `gpt-3.5-turbo`. The benchmark's creators suggest language models fall short when modeling mental states and psychology, but we find using DSPy to learn few-shot examples leads to significantly outperforming all the models tested (`gpt-4-turbo` included) along this precise axis.
|
|
||||||
|
|
||||||
The fact you can learn few-shot examples to make a small, fast model perform just as well on a task as a large, slow one is significant. This signals to us a need to broaden the scope of methods for evaluating Theory of Mind capabilities in LLMs, because the social cognition needed to [[Humans like personalization |build great products]] goes far beyond just answering questions about stories.
|
|
||||||
|
|
||||||
## The OpenToM Dataset
|
|
||||||
|
|
||||||
On February 14th, 2024 a paper dropped on ArXiv introducing the OpenToM benchmark: a new dataset to use for evaluating Theory of Mind (ToM) in Large Language Models. ToM evals are typically borrowed from developmental psychology and consist of character-driven scenarios. The language model is asked to answer questions about various aspects of the characters' mental states. This ability has traditionally been thought of to be uniquely human (or limited to a very few species), but language models are starting to exhibit some level of proficiency in this task as well.
|
On February 14th, 2024 a paper dropped on ArXiv introducing the OpenToM benchmark: a new dataset to use for evaluating Theory of Mind (ToM) in Large Language Models. ToM evals are typically borrowed from developmental psychology and consist of character-driven scenarios. The language model is asked to answer questions about various aspects of the characters' mental states. This ability has traditionally been thought of to be uniquely human (or limited to a very few species), but language models are starting to exhibit some level of proficiency in this task as well.
|
||||||
|
|
||||||
The authors of this paper point out how the characters in existing datasets lack personality traits or preferences, along with motivations for their actions. To remedy this, they devised a generation pipeline that does the following:
|
The authors of this paper point out how the characters in existing datasets lack personality traits or preferences, along with motivations for their actions. To remedy this, they devised a generation pipeline that does the following:
|
||||||
@ -43,10 +41,8 @@ Within Location there are *coarse* and *fine* questions and within both Location
|
|||||||
- **Second Order**: inquires about a character's belief of another character's mental state
|
- **Second Order**: inquires about a character's belief of another character's mental state
|
||||||
|
|
||||||
In the ToM space, there is really only one prompting technique that has shown improved results over Chain of Thought (CoT) called "SimToM" [(Wilf, et al)](https://arxiv.org/pdf/2311.10227.pdf), which is a two-stage prompting framework to re-phrase the narrative through the perspective of the subject in question. CoT and SimToM are the only two tested against the dataset in the paper.
|
In the ToM space, there is really only one prompting technique that has shown improved results over Chain of Thought (CoT) called "SimToM" [(Wilf, et al)](https://arxiv.org/pdf/2311.10227.pdf), which is a two-stage prompting framework to re-phrase the narrative through the perspective of the subject in question. CoT and SimToM are the only two tested against the dataset in the paper.
|
||||||
|
# Experiments with DSPy
|
||||||
## Experiments with DSPy
|
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[ARCHIVED; Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[ARCHIVED; User State is State of the Art#^461ac9|learn]] the prompts and examples to optimize performance on this benchmark?"
|
||||||
|
|
||||||
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[User State is State of the Art#^461ac9 |learn]] the prompts and examples to optimize performance on this benchmark?"
|
|
||||||
|
|
||||||
This task is relatively easy to define in DSPy terms: `(context, question -> answer)`. This [guide](https://dspy-docs.vercel.app/docs/tutorials/simplified-baleen#optimizing-the-pipeline) was helpful in crafting our modules which can be found [here](https://github.com/plastic-labs/dspy-opentom/blob/main/cot.py). The authors of the OpenToM paper also released extensive [evaluation code](https://github.com/plastic-labs/dspy-opentom/blob/main/opentom_evaluator.py) which we leveraged heavily for parsing the LM's answers and assessing them.
|
This task is relatively easy to define in DSPy terms: `(context, question -> answer)`. This [guide](https://dspy-docs.vercel.app/docs/tutorials/simplified-baleen#optimizing-the-pipeline) was helpful in crafting our modules which can be found [here](https://github.com/plastic-labs/dspy-opentom/blob/main/cot.py). The authors of the OpenToM paper also released extensive [evaluation code](https://github.com/plastic-labs/dspy-opentom/blob/main/opentom_evaluator.py) which we leveraged heavily for parsing the LM's answers and assessing them.
|
||||||
|
|
||||||
@ -57,9 +53,7 @@ We conducted the following experiments:
|
|||||||
3. Learn system prompts with the `SignatureOptimizer` and the `BayesianSignatureOptimizer`
|
3. Learn system prompts with the `SignatureOptimizer` and the `BayesianSignatureOptimizer`
|
||||||
|
|
||||||
Obviously there is much more we could have done, so if you're reading this and you have the time (and inferencing budget) to run more comprehensive experiments, [get in touch](https://discord.gg/plasticlabs) — we'd love to help!
|
Obviously there is much more we could have done, so if you're reading this and you have the time (and inferencing budget) to run more comprehensive experiments, [get in touch](https://discord.gg/plasticlabs) — we'd love to help!
|
||||||
|
# Results
|
||||||
## Results
|
|
||||||
|
|
||||||
The findings of our experiments were mixed but promising. We found that the only experiment that showed positive results was compiling a CoT-prompted `gpt-3.5-turbo` module with the `BootstrapFewShotWithRandomSearch` optimizer. Both of the signature optimizers and `gpt-4` as a teacher in `BootstrapFewShotWithRandomSearch` didn't have much of an effect.
|
The findings of our experiments were mixed but promising. We found that the only experiment that showed positive results was compiling a CoT-prompted `gpt-3.5-turbo` module with the `BootstrapFewShotWithRandomSearch` optimizer. Both of the signature optimizers and `gpt-4` as a teacher in `BootstrapFewShotWithRandomSearch` didn't have much of an effect.
|
||||||
|
|
||||||
Our full experiment amounted to roughly $300 in inference costs, running 50 training examples on 25 candidate programs. We evaluated performance the same way the paper did, by randomly sampling 50 examples from a hold out set in 5 batches and computing average F1 scores. You can view our forum discussion in the DSPy Discord [here](https://discord.com/channels/1161519468141355160/1214629969318252574).
|
Our full experiment amounted to roughly $300 in inference costs, running 50 training examples on 25 candidate programs. We evaluated performance the same way the paper did, by randomly sampling 50 examples from a hold out set in 5 batches and computing average F1 scores. You can view our forum discussion in the DSPy Discord [here](https://discord.com/channels/1161519468141355160/1214629969318252574).
|
||||||
@ -79,9 +73,7 @@ The following table shows our results from experiment number one compared to the
|
|||||||
On most of the question types, we see CoT-prompted `gpt-3.5-turbo` compiled with `BootstrapFewShotWithRandomSearch` examples outperforms both CoT-prompted base `gpt-3.5-turbo` as well as `mixtral`, and comes close to `gpt-4-turbo` performance — which is quite impressive! The exceptions here are fine, second-order location questions (which outperform `gpt-4-turbo` 🥳) and fine, first-order location questions (which underperform `gpt-4-turbo`). Due to budget constraints, we only tested `gpt-3.5-turbo`.
|
On most of the question types, we see CoT-prompted `gpt-3.5-turbo` compiled with `BootstrapFewShotWithRandomSearch` examples outperforms both CoT-prompted base `gpt-3.5-turbo` as well as `mixtral`, and comes close to `gpt-4-turbo` performance — which is quite impressive! The exceptions here are fine, second-order location questions (which outperform `gpt-4-turbo` 🥳) and fine, first-order location questions (which underperform `gpt-4-turbo`). Due to budget constraints, we only tested `gpt-3.5-turbo`.
|
||||||
|
|
||||||
What's particularly interesting is the performance on the fine, second-order location questions (Loc$_{f}(S)$). As a reminder, second-order questions inquire about a character's belief of another character's mental state. This is the exact type of question the OpenToM authors claim that LMs perform poorly on, yet we saw that with our learned few-shot examples, it outperforms all of the other language models significantly.
|
What's particularly interesting is the performance on the fine, second-order location questions (Loc$_{f}(S)$). As a reminder, second-order questions inquire about a character's belief of another character's mental state. This is the exact type of question the OpenToM authors claim that LMs perform poorly on, yet we saw that with our learned few-shot examples, it outperforms all of the other language models significantly.
|
||||||
|
# Analysis of Augmented Examples
|
||||||
## Analysis of Augmented Examples
|
|
||||||
|
|
||||||
The augmented examples from the compiled modules seem to mimic the format of the stories within each question type/granularity. You can see all of them on [GitHub](https://github.com/vintrocode/dspy-opentom/blob/main/cot_modules.pkl), but here are two examples:
|
The augmented examples from the compiled modules seem to mimic the format of the stories within each question type/granularity. You can see all of them on [GitHub](https://github.com/vintrocode/dspy-opentom/blob/main/cot_modules.pkl), but here are two examples:
|
||||||
|
|
||||||
**Attitude**:
|
**Attitude**:
|
||||||
@ -99,16 +91,14 @@ It's hard to parse out any specific patterns between the examples themselves. It
|
|||||||
That's it? What was it about Ryker's affinity for raincoats that piqued his curiosity when it was hung up? Why would the story end there? The same thing basically happened in the first story, with Paxton throwing away the socks and Anderson never knowing about it.
|
That's it? What was it about Ryker's affinity for raincoats that piqued his curiosity when it was hung up? Why would the story end there? The same thing basically happened in the first story, with Paxton throwing away the socks and Anderson never knowing about it.
|
||||||
|
|
||||||
In manually inspecting both the dataset and the augmented examples, it's clear that GPT-4 (the model used to generate the narratives) had a tendency to dramatize things. But it's still unclear as to why these examples (along with 16 others) were useful in increasing task performance. To borrow a quote from [Battle and Gollapudi](https://arxiv.org/pdf/2402.10949.pdf), "the only real trend may be no trend". Maybe counterintuitively, this is still an important result.
|
In manually inspecting both the dataset and the augmented examples, it's clear that GPT-4 (the model used to generate the narratives) had a tendency to dramatize things. But it's still unclear as to why these examples (along with 16 others) were useful in increasing task performance. To borrow a quote from [Battle and Gollapudi](https://arxiv.org/pdf/2402.10949.pdf), "the only real trend may be no trend". Maybe counterintuitively, this is still an important result.
|
||||||
|
# Towards Better Theory of Mind Evals
|
||||||
## Towards Better Theory of Mind Evals
|
|
||||||
|
|
||||||
The OpenToM authors were correct in identifying common pitfalls with existing ToM tests and their contributions with the dataset are a significant step forward. However, we still believe these tests are fundamentally flawed in an AI context.
|
The OpenToM authors were correct in identifying common pitfalls with existing ToM tests and their contributions with the dataset are a significant step forward. However, we still believe these tests are fundamentally flawed in an AI context.
|
||||||
|
|
||||||
We know that any observed "reasoning" in language models is due to behaviors learned in training. These tests are assessing their abilities to answer correctly in a single inference, which is both impressive and completely unrealistic. Real AI products already have access to memory, tools, multiple inferences, and more. They're going to be interacting with humans in more and more social settings, not trying to answer questions about hypothetical stories. Humans and agents are much more complex than that.
|
We know that any observed "reasoning" in language models is due to behaviors learned in training. These tests are assessing their abilities to answer correctly in a single inference, which is both impressive and completely unrealistic. Real AI products already have access to memory, tools, multiple inferences, and more. They're going to be interacting with humans in more and more social settings, not trying to answer questions about hypothetical stories. Humans and agents are much more complex than that.
|
||||||
|
|
||||||
There was a time when people were upset at the inability to interpret features learned by neural networks. People have mostly moved on from that limitation in favor of the improved performance, so maybe it's time to do the same here. It follows the design philosophy of DSPy to abstract away the need to manipulate explicit prompts and examples to improve performance on a task. The examples it settled on were learned — DSPy worked exactly how it's supposed to. Deep learning uses neurons in a network to learn latent, arbitrary features optimized against an objective. The abstraction has just moved up a layer to the space of prompts that can be used to optimize against an objective.
|
There was a time when people were upset at the inability to interpret features learned by neural networks. People have mostly moved on from that limitation in favor of the improved performance, so maybe it's time to do the same here. It follows the design philosophy of DSPy to abstract away the need to manipulate explicit prompts and examples to improve performance on a task. The examples it settled on were learned — DSPy worked exactly how it's supposed to. Deep learning uses neurons in a network to learn latent, arbitrary features optimized against an objective. The abstraction has just moved up a layer to the space of prompts that can be used to optimize against an objective.
|
||||||
|
|
||||||
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[User State is State of the Art |infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
|
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[ARCHIVED; User State is State of the Art|infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
|
||||||
|
|
||||||
Major thanks go to [Jacob Van Meter](https://www.linkedin.com/in/jacob-van-meter-nc/) for his significant contributions to this project, [Omar Khattab](https://twitter.com/lateinteraction) and the [DSPy](https://dspy-docs.vercel.app/) team, as well as the [OpenToM](https://github.com/seacowx/OpenToM) authors for moving the ToM space forward. You can see all of our code and data [here](https://github.com/plastic-labs/dspy-opentom/tree/main).
|
Major thanks go to [Jacob Van Meter](https://www.linkedin.com/in/jacob-van-meter-nc/) for his significant contributions to this project, [Omar Khattab](https://twitter.com/lateinteraction) and the [DSPy](https://dspy-docs.vercel.app/) team, as well as the [OpenToM](https://github.com/seacowx/OpenToM) authors for moving the ToM space forward. You can see all of our code and data [here](https://github.com/plastic-labs/dspy-opentom/tree/main).
|
||||||
|
|
||||||
@ -1,20 +1,21 @@
|
|||||||
---
|
---
|
||||||
title: Can AI Models Predict What You'll Say Next? Developing Verifiable Social Rewards
|
title: Can AI Models Predict What You'll Say Next? Developing Verifiable Social Rewards
|
||||||
author: Dani Balcells
|
|
||||||
date: 02.28.25
|
date: 02.28.25
|
||||||
tags:
|
tags:
|
||||||
- research
|
- research
|
||||||
- ml
|
- ml
|
||||||
|
author: Dani Balcells
|
||||||
|
description: Developing verifiable social rewards for AI--benchmarking LLMs on next-message prediction in conversations & discovering that reasoning models underperform on social cognition.
|
||||||
---
|
---
|
||||||
## TL;DR
|
# TL;DR
|
||||||
We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test wether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.
|
*We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test whether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[ARCHIVED; Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.*
|
||||||
|
|
||||||
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public).
|
*Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public).*
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
*Figure 1. Next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
|
*Figure 1. Next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
|
||||||
## Finding Verifiable Social Rewards
|
# Finding Verifiable Social Rewards
|
||||||
The machine learning community has made significant progress optimizing language models for tasks with clear, verifiable answers, like math, coding, and factual reasoning. These domains offer what are called "verifiable rewards": objective measures that can be used for reinforcement learning without relying on human preferences or subjective judgments. While this approach has yielded impressive results for technical reasoning, at Plastic Labs we've become increasingly curious about whether similar verifiable reward structures could be developed for social intelligence.
|
The machine learning community has made significant progress optimizing language models for tasks with clear, verifiable answers, like math, coding, and factual reasoning. These domains offer what are called "verifiable rewards": objective measures that can be used for reinforcement learning without relying on human preferences or subjective judgments. While this approach has yielded impressive results for technical reasoning, at Plastic Labs we've become increasingly curious about whether similar verifiable reward structures could be developed for social intelligence.
|
||||||
|
|
||||||
Here, by social intelligence we mean the ability to accurately interpret others' intentions, emotions, and likely behaviors in social contexts--essentially modeling other minds to predict social outcomes. In this sense, our social cognition is as essential to our functioning as having a robust predictive model of physics, our environment and proprioception. While humans develop this ability naturally through social feedback (successful predictions are "rewarded" with smoother interactions), creating objective measures for this in AI systems remains challenging.
|
Here, by social intelligence we mean the ability to accurately interpret others' intentions, emotions, and likely behaviors in social contexts--essentially modeling other minds to predict social outcomes. In this sense, our social cognition is as essential to our functioning as having a robust predictive model of physics, our environment and proprioception. While humans develop this ability naturally through social feedback (successful predictions are "rewarded" with smoother interactions), creating objective measures for this in AI systems remains challenging.
|
||||||
@ -24,12 +25,12 @@ To address this gap, we developed a multiple-choice next-message prediction task
|
|||||||
This creates a clear, verifiable reward signal for social understanding: either the model correctly identifies the real message or it doesn't. Yet unlike many technical tasks, success requires the model to understand conversational dynamics, recognize individual communication patterns, track context across multiple turns, and model how different people behave in specific social contexts.
|
This creates a clear, verifiable reward signal for social understanding: either the model correctly identifies the real message or it doesn't. Yet unlike many technical tasks, success requires the model to understand conversational dynamics, recognize individual communication patterns, track context across multiple turns, and model how different people behave in specific social contexts.
|
||||||
|
|
||||||
This benchmark also allows us to test whether models specifically optimized for technical reasoning generalize to social understanding, and to get a granular, quantifiable understanding of models' social reasoning abilities.
|
This benchmark also allows us to test whether models specifically optimized for technical reasoning generalize to social understanding, and to get a granular, quantifiable understanding of models' social reasoning abilities.
|
||||||
## Prior work & inspiration
|
# Prior work & inspiration
|
||||||
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
|
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[ARCHIVED; Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
|
||||||
|
|
||||||
With recent work on reasoning models, including DeepSeek's R1, showing remarkable gains through reinforcement learning on mathematical and coding tasks, we're particularly interested in developing verifiable social rewards that could drive similar improvements in social reasoning. Unlike technical domains with clear right and wrong answers, social prediction introduces unique challenges--yet, establishing benchmarks in this area could unlock entirely new dimensions of AI capability that are crucial for creating systems that truly understand and adapt to human users.
|
With recent work on reasoning models, including DeepSeek's R1, showing remarkable gains through reinforcement learning on mathematical and coding tasks, we're particularly interested in developing verifiable social rewards that could drive similar improvements in social reasoning. Unlike technical domains with clear right and wrong answers, social prediction introduces unique challenges--yet, establishing benchmarks in this area could unlock entirely new dimensions of AI capability that are crucial for creating systems that truly understand and adapt to human users.
|
||||||
## Methodology
|
# Methodology
|
||||||
### Dataset Creation
|
## Dataset Creation
|
||||||
We created our dataset by extracting conversation snippets from our internal team Discord channels (accessible only to our core team of 5-10 people). Each snippet contained:
|
We created our dataset by extracting conversation snippets from our internal team Discord channels (accessible only to our core team of 5-10 people). Each snippet contained:
|
||||||
|
|
||||||
- 6-10 messages between exactly two participants.
|
- 6-10 messages between exactly two participants.
|
||||||
@ -61,7 +62,7 @@ We ended up with 123 snippets—below is an example:
|
|||||||
|
|
||||||
> [!question]- Can you guess the right answer?
|
> [!question]- Can you guess the right answer?
|
||||||
> D! Classic Vince being Bayesian.
|
> D! Classic Vince being Bayesian.
|
||||||
### Context Modes
|
## Context Modes
|
||||||
Upon visual inspection of the resulting dataset, we found that the decoys were remarkably similar to the real messages, making it difficult even for us to consistently identify the genuine response. We wondered if providing additional context about the users might help determine the correct answer, which led us to explore different context modes:
|
Upon visual inspection of the resulting dataset, we found that the decoys were remarkably similar to the real messages, making it difficult even for us to consistently identify the genuine response. We wondered if providing additional context about the users might help determine the correct answer, which led us to explore different context modes:
|
||||||
|
|
||||||
1. **No Context**: Models only received the immediate conversation snippet and the four options.
|
1. **No Context**: Models only received the immediate conversation snippet and the four options.
|
||||||
@ -69,7 +70,7 @@ Upon visual inspection of the resulting dataset, we found that the decoys were r
|
|||||||
3. **Summary Context**: Models received the conversation snippet plus a generated personality profile of the target user, created by processing the previous 50 or 100 messages through Llama 3.3 70B. The prompt used to generate this summary is available in the [project repo](https://github.com/plastic-labs/next-message-prediction-public/blob/950384174023ba315b628d3ba7bdb7c00b918544/generate_dataset.py#L156) on GitHub.
|
3. **Summary Context**: Models received the conversation snippet plus a generated personality profile of the target user, created by processing the previous 50 or 100 messages through Llama 3.3 70B. The prompt used to generate this summary is available in the [project repo](https://github.com/plastic-labs/next-message-prediction-public/blob/950384174023ba315b628d3ba7bdb7c00b918544/generate_dataset.py#L156) on GitHub.
|
||||||
|
|
||||||
This design allowed us to compare whether any context provides useful signals for predicting social behavior, and whether a summary can provide results comparable to the full context.
|
This design allowed us to compare whether any context provides useful signals for predicting social behavior, and whether a summary can provide results comparable to the full context.
|
||||||
### Experimental Setup
|
## Experimental Setup
|
||||||
We tested a wide range of models including:
|
We tested a wide range of models including:
|
||||||
- Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3.5 Haiku.
|
- Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3.5 Haiku.
|
||||||
- GPT-4.5, GPT-4o, GPT-4o Mini, O-1, O-3 Mini.
|
- GPT-4.5, GPT-4o, GPT-4o Mini, O-1, O-3 Mini.
|
||||||
@ -79,15 +80,15 @@ We tested a wide range of models including:
|
|||||||
- DeepSeek models (Chat and R1).
|
- DeepSeek models (Chat and R1).
|
||||||
|
|
||||||
For each model and context mode combination, we ran three trials with different random seeds to control for position bias in option selection. Ideally we would have run more trials, but we wanted to constrain the compute needed for this experiment.
|
For each model and context mode combination, we ran three trials with different random seeds to control for position bias in option selection. Ideally we would have run more trials, but we wanted to constrain the compute needed for this experiment.
|
||||||
## Results and Discussion
|
# Results and Discussion
|
||||||
The results of our experiment are shown in Figure 1. In this section, we analyze them in detail and provide some insights and interpretation.
|
The results of our experiment are shown in Figure 1. In this section, we analyze them in detail and provide some insights and interpretation.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
*Figure 1. Mean next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
|
*Figure 1. Mean next-message prediction accuracy (%) by model and context mode. Error bars show standard error over three different runs with different random seeds to shuffle the order of the options.*
|
||||||
### Context Helps Regardless of Form
|
## Context Helps Regardless of Form
|
||||||
Additional context helps models predict social behavior, whether that context is provided as raw conversation history or as a processed summary. Moving from no context to either raw or summary context yielded substantial improvements for virtually all models tested. This confirms what might seem intuitive: knowing more about someone helps predict what they might say next.
|
Additional context helps models predict social behavior, whether that context is provided as raw conversation history or as a processed summary. Moving from no context to either raw or summary context yielded substantial improvements for virtually all models tested. This confirms what might seem intuitive: knowing more about someone helps predict what they might say next.
|
||||||
### Efficient Context Processing Works
|
## Efficient Context Processing Works
|
||||||
What's particularly significant is that injecting pre-processed summaries of user context works as well as or better than providing raw context for most models. This has important implications for system design:
|
What's particularly significant is that injecting pre-processed summaries of user context works as well as or better than providing raw context for most models. This has important implications for system design:
|
||||||
|
|
||||||
1. The summaries contain far fewer tokens than raw context (approximately one paragraph versus potentially thousands of tokens).
|
1. The summaries contain far fewer tokens than raw context (approximately one paragraph versus potentially thousands of tokens).
|
||||||
@ -97,28 +98,27 @@ What's particularly significant is that injecting pre-processed summaries of use
|
|||||||
This supports a core [thesis](https://blog.plasticlabs.ai/blog/Theory-of-Mind-Is-All-You-Need) behind Honcho: ambient processing of user context to generate compressed representations can improve model performance while keeping inference costs manageable. Rather than injecting massive amounts of data into the context window, models can achieve better results with distilled personality profiles.
|
This supports a core [thesis](https://blog.plasticlabs.ai/blog/Theory-of-Mind-Is-All-You-Need) behind Honcho: ambient processing of user context to generate compressed representations can improve model performance while keeping inference costs manageable. Rather than injecting massive amounts of data into the context window, models can achieve better results with distilled personality profiles.
|
||||||
|
|
||||||
We didn't observe significant performance differences between 50-message and 100-message contexts, suggesting there may be diminishing returns beyond a certain point. This is likely dependent on factors like user count and conversation density.
|
We didn't observe significant performance differences between 50-message and 100-message contexts, suggesting there may be diminishing returns beyond a certain point. This is likely dependent on factors like user count and conversation density.
|
||||||
### Newest Models Lead the Way
|
## Newest Models Lead the Way
|
||||||
Only the newest models perform well on this task. Claude 3.7 Sonnet and GPT-4.5 (both released last week) were the only models to achieve accuracy significantly above 40% in any context mode, with Claude 3.7 (non-thinking) reaching nearly 60% accuracy with summary context—more than doubling the 25% random baseline.
|
Only the newest models perform well on this task. Claude 3.7 Sonnet and GPT-4.5 (both released last week) were the only models to achieve accuracy significantly above 40% in any context mode, with Claude 3.7 (non-thinking) reaching nearly 60% accuracy with summary context—more than doubling the 25% random baseline.
|
||||||
|
|
||||||
This is particularly interesting because tasks that would have seemed impossible for models that existed just months ago are now becoming tractable. This rapid progress also informs how we should think about designing evaluations—creating hard tasks that aren't saturated from the start rather than ones where models already perform at ceiling.
|
This is particularly interesting because tasks that would have seemed impossible for models that existed just months ago are now becoming tractable. This rapid progress also informs how we should think about designing evaluations—creating hard tasks that aren't saturated from the start rather than ones where models already perform at ceiling.
|
||||||
### Different Models Benefit from Different Contexts
|
## Different Models Benefit from Different Contexts
|
||||||
While summary context generally outperformed raw context, this pattern wasn't universal. Some models (notably Claude 3.5 Sonnet and GPT-4.5) performed better with raw context than with summaries. This suggests different architectures may vary in their ability to extract relevant information from different types of context.
|
While summary context generally outperformed raw context, this pattern wasn't universal. Some models (notably Claude 3.5 Sonnet and GPT-4.5) performed better with raw context than with summaries. This suggests different architectures may vary in their ability to extract relevant information from different types of context.
|
||||||
### Reasoning vs Social Understanding Trade-offs
|
## Reasoning vs Social Understanding Trade-offs
|
||||||
The relatively poor performance of models optimized for technical reasoning, like Claude 3.7 Sonnet (thinking), DeepSeek R1, and OpenAI's O-1 and O-3 Mini, raises interesting questions. Despite their strong results on math and coding benchmarks, these models achieved well below random performance on our social prediction task.
|
The relatively poor performance of models optimized for technical reasoning, like Claude 3.7 Sonnet (thinking), DeepSeek R1, and OpenAI's O-1 and O-3 Mini, raises interesting questions. Despite their strong results on math and coding benchmarks, these models achieved well below random performance on our social prediction task.
|
||||||
|
|
||||||
This suggests potential trade-offs in model optimization. The reinforcement learning or supervised fine-tuning techniques used to enhance reasoning abilities might come at the expense of social cognition capabilities. However, without access to the architectures, data and training procedures that major labs like Anthropic and OpenAI use to build these models, it's hard to know exactly what might be causing models like Claude 3.7 Sonnet and GPT-4.5 to perform so much better on this task.
|
This suggests potential trade-offs in model optimization. The reinforcement learning or supervised fine-tuning techniques used to enhance reasoning abilities might come at the expense of social cognition capabilities. However, without access to the architectures, data and training procedures that major labs like Anthropic and OpenAI use to build these models, it's hard to know exactly what might be causing models like Claude 3.7 Sonnet and GPT-4.5 to perform so much better on this task.
|
||||||
### Caveat: Decoy Generation
|
## Caveat: Decoy Generation
|
||||||
We should note that our decoys were generated using Claude 3.7 Sonnet, which was also the best-performing model on the task. It's possible that Claude 3.7 is better at recognizing the subtleties in its own generations. However, this almost creates a generative adversarial setup—Claude 3.7 is both generating challenging decoys and trying to identify them—which makes its strong performance even more notable.
|
We should note that our decoys were generated using Claude 3.7 Sonnet, which was also the best-performing model on the task. It's possible that Claude 3.7 is better at recognizing the subtleties in its own generations. However, this almost creates a generative adversarial setup—Claude 3.7 is both generating challenging decoys and trying to identify them—which makes its strong performance even more notable.
|
||||||
## Future Directions
|
# Future Directions
|
||||||
### Verifiable Social Rewards for RL
|
## Verifiable Social Rewards for RL
|
||||||
|
|
||||||
So far, we've used this task purely as an evaluation metric, but with a large enough dataset, it could potentially serve as a reward signal for reinforcement learning. This would allow for optimization of social cognition abilities with objective metrics, similar to how technical reasoning has been enhanced. Expanding our toolkit of objective social evaluation metrics could help bridge the gap between technical and social intelligence.
|
So far, we've used this task purely as an evaluation metric, but with a large enough dataset, it could potentially serve as a reward signal for reinforcement learning. This would allow for optimization of social cognition abilities with objective metrics, similar to how technical reasoning has been enhanced. Expanding our toolkit of objective social evaluation metrics could help bridge the gap between technical and social intelligence.
|
||||||
### Social-Reasoning Balance
|
## Social-Reasoning Balance
|
||||||
Can we develop training techniques that enhance reasoning capabilities without sacrificing social cognition? This might involve carefully designed datasets that balance technical and social tasks, or novel fine-tuning approaches that preserve multiple types of capabilities. Understanding the apparent trade-off between these abilities could be crucial for developing more well-rounded AI systems.
|
Can we develop training techniques that enhance reasoning capabilities without sacrificing social cognition? This might involve carefully designed datasets that balance technical and social tasks, or novel fine-tuning approaches that preserve multiple types of capabilities. Understanding the apparent trade-off between these abilities could be crucial for developing more well-rounded AI systems.
|
||||||
### Context Optimization and Alternative Approaches
|
## Context Optimization and Alternative Approaches
|
||||||
|
|
||||||
We're also interested in exploring several technical improvements to the methodology: finding the minimum effective context window size across different environments; testing different prompting techniques and models for generating personality summaries; experimenting with combinations of raw and summary contexts; and trying different models for decoy generation to address potential advantages Claude 3.7 might have in recognizing its own outputs.
|
We're also interested in exploring several technical improvements to the methodology: finding the minimum effective context window size across different environments; testing different prompting techniques and models for generating personality summaries; experimenting with combinations of raw and summary contexts; and trying different models for decoy generation to address potential advantages Claude 3.7 might have in recognizing its own outputs.
|
||||||
## Conclusion
|
# Conclusion
|
||||||
We were excited to find that this social prediction task was genuinely challenging for most current models, with only the very latest releases showing strong performance. The fact that models optimized for reasoning performed poorly suggests interesting trade-offs in current training approaches. Meanwhile, the effectiveness of pre-processed context summaries supports a key principle behind Honcho: ambient processing of user context can significantly improve personalization while managing compute costs.
|
We were excited to find that this social prediction task was genuinely challenging for most current models, with only the very latest releases showing strong performance. The fact that models optimized for reasoning performed poorly suggests interesting trade-offs in current training approaches. Meanwhile, the effectiveness of pre-processed context summaries supports a key principle behind Honcho: ambient processing of user context can significantly improve personalization while managing compute costs.
|
||||||
|
|
||||||
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public). We used our private Discord messages for the experiment so we're unable to publish our own dataset, but the repository contains instructions to replicate the experiment with your own data. If you have any questions, feel free to ask on GitHub!
|
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public). We used our private Discord messages for the experiment so we're unable to publish our own dataset, but the repository contains instructions to replicate the experiment with your own data. If you have any questions, feel free to ask on GitHub!
|
||||||
|
|||||||
@ -1,16 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: Evaluating Steerability in Large Language Models
|
title: Evaluating Steerability in Large Language Models
|
||||||
author: Dani Balcells
|
|
||||||
date: 12.14.24
|
date: 12.14.24
|
||||||
tags:
|
tags:
|
||||||
- research
|
- research
|
||||||
- ml
|
- ml
|
||||||
|
author: Dani Balcells
|
||||||
|
description: A new benchmark framework for measuring how well AI systems can adapt to different personas, implementing the first trade-off steerable benchmark.
|
||||||
---
|
---
|
||||||
|
# TL;DR
|
||||||
|
|
||||||
## TL;DR
|
|
||||||
*This is a research update on our ongoing work to implement concrete benchmarks for measuring AI systems' ability to adapt to different users. We've created what we believe is the first implementation of a "trade-off steerable benchmark" - a framework proposed by Sorensen et al. for evaluating how well AI systems can be steered to reflect different perspectives. While we've made progress on the core dataset and evaluation pipeline, several key questions remain about how to make this benchmark as useful as possible to the research community. We're sharing this update to gather feedback at NeurIPS 2024 in Vancouver on the most valuable directions to take this work.*
|
*This is a research update on our ongoing work to implement concrete benchmarks for measuring AI systems' ability to adapt to different users. We've created what we believe is the first implementation of a "trade-off steerable benchmark" - a framework proposed by Sorensen et al. for evaluating how well AI systems can be steered to reflect different perspectives. While we've made progress on the core dataset and evaluation pipeline, several key questions remain about how to make this benchmark as useful as possible to the research community. We're sharing this update to gather feedback at NeurIPS 2024 in Vancouver on the most valuable directions to take this work.*
|
||||||
|
|
||||||
# 1. Measuring AI Systems' Ability to Adapt to Different Users
|
# 1. Measuring AI Systems' Ability to Adapt to Different Users
|
||||||
At Plastic Labs, we're building AI systems that can adapt to and act on behalf of their users. As we continue to improve these systems, it's critical that we can reliably measure their ability to faithfully represent different people's views and behaviors.
|
At Plastic Labs, we're building AI systems that can adapt to and act on behalf of their users. As we continue to improve these systems, it's critical that we can reliably measure their ability to faithfully represent different people's views and behaviors.
|
||||||
|
|
||||||
@ -19,7 +17,6 @@ Today we're introducing a new evaluation framework that systematically tests an
|
|||||||
The AI community has made remarkable progress in building powerful language models that can engage in open-ended dialogue. However, these models are typically aligned through techniques like RLHF that optimize for a single set of "average" human preferences. This approach falls short when we want AI systems that can truly adapt to individual users with different values, personalities and preferences.
|
The AI community has made remarkable progress in building powerful language models that can engage in open-ended dialogue. However, these models are typically aligned through techniques like RLHF that optimize for a single set of "average" human preferences. This approach falls short when we want AI systems that can truly adapt to individual users with different values, personalities and preferences.
|
||||||
|
|
||||||
Recent work has established the importance of pluralistic alignment - ensuring AI systems can faithfully represent diverse human perspectives. While conceptual frameworks for measuring this capability have been proposed, notably by Sorensen et al., the authors acknowledge that to their knowledge no concrete implementations of these frameworks exist yet. This makes it difficult to assess progress or compare different approaches.
|
Recent work has established the importance of pluralistic alignment - ensuring AI systems can faithfully represent diverse human perspectives. While conceptual frameworks for measuring this capability have been proposed, notably by Sorensen et al., the authors acknowledge that to their knowledge no concrete implementations of these frameworks exist yet. This makes it difficult to assess progress or compare different approaches.
|
||||||
|
|
||||||
## Our Approach
|
## Our Approach
|
||||||
We've created an evaluation framework that systematically measures an AI system's ability to adapt to different personas. The core idea is simple: we give the system a few examples of how a persona thinks and behaves, then test whether it can accurately predict that persona's views on new scenarios. By testing many different personas and comparing how well each steered version of the system maintains fidelity to its target persona, we can quantify how "steerable" the system is.
|
We've created an evaluation framework that systematically measures an AI system's ability to adapt to different personas. The core idea is simple: we give the system a few examples of how a persona thinks and behaves, then test whether it can accurately predict that persona's views on new scenarios. By testing many different personas and comparing how well each steered version of the system maintains fidelity to its target persona, we can quantify how "steerable" the system is.
|
||||||
|
|
||||||
@ -28,10 +25,8 @@ Our research questions include:
|
|||||||
- How well do simple steering approaches like few-shot learning actually perform?
|
- How well do simple steering approaches like few-shot learning actually perform?
|
||||||
|
|
||||||
In the following sections, we'll detail our methodology and share initial results that shed light on these questions. We hope this work helps establish more rigorous ways to evaluate AI systems' ability to reflect human diversity.
|
In the following sections, we'll detail our methodology and share initial results that shed light on these questions. We hope this work helps establish more rigorous ways to evaluate AI systems' ability to reflect human diversity.
|
||||||
|
|
||||||
# 2. Creating a Dataset to Test Personality Adaptation
|
# 2. Creating a Dataset to Test Personality Adaptation
|
||||||
To evaluate an AI system's ability to adapt to different personas, we first needed a dataset of diverse personalities and their characteristic behaviors. We approached this as a careful balance between coverage, quality and cost - we wanted to represent a wide range of human personalities while ensuring the data was reliable enough to serve as ground truth, all while keeping the time and compute required to develop the dataset to a reasonable minimum.
|
To evaluate an AI system's ability to adapt to different personas, we first needed a dataset of diverse personalities and their characteristic behaviors. We approached this as a careful balance between coverage, quality and cost - we wanted to represent a wide range of human personalities while ensuring the data was reliable enough to serve as ground truth, all while keeping the time and compute required to develop the dataset to a reasonable minimum.
|
||||||
|
|
||||||
## Seeding Diverse Personas
|
## Seeding Diverse Personas
|
||||||
For our initial implementation, we needed a systematic way to generate personas that would exhibit meaningfully different attitudes and behaviors. While recent work like the Billion Personality Dataset has explored prompting LLMs with simple role descriptions like "a musician interested in audio processing" or "a moving company driver", there's no guarantee such prompts will produce distinct behavioral patterns. Instead, we used five well-known personality frameworks (Myers-Briggs Type Indicator, Enneagram, Big Five, Zodiac signs, and Tarot archetypes) that each attempt to provide complete coverage of human personality space.
|
For our initial implementation, we needed a systematic way to generate personas that would exhibit meaningfully different attitudes and behaviors. While recent work like the Billion Personality Dataset has explored prompting LLMs with simple role descriptions like "a musician interested in audio processing" or "a moving company driver", there's no guarantee such prompts will produce distinct behavioral patterns. Instead, we used five well-known personality frameworks (Myers-Briggs Type Indicator, Enneagram, Big Five, Zodiac signs, and Tarot archetypes) that each attempt to provide complete coverage of human personality space.
|
||||||
|
|
||||||
@ -93,7 +88,6 @@ The binary agree/disagree format enables reliable scoring while minimizing measu
|
|||||||
# 3. Methodology: Measuring Steerability
|
# 3. Methodology: Measuring Steerability
|
||||||
|
|
||||||
## The Core Task: Steering and Testing
|
## The Core Task: Steering and Testing
|
||||||
|
|
||||||
Our evaluation framework measures how well a given system can steer to different personas. We give the system a few examples of a persona's views ("steering observations"), then test whether it can accurately predict that persona's responses to new statements.
|
Our evaluation framework measures how well a given system can steer to different personas. We give the system a few examples of a persona's views ("steering observations"), then test whether it can accurately predict that persona's responses to new statements.
|
||||||
|
|
||||||
Formally, we define:
|
Formally, we define:
|
||||||
@ -120,7 +114,6 @@ For example, to test adaptation to an INFP personality:
|
|||||||
To measure the overall steerability of the system, we repeat the process above for all personas and average the resulting percentile rank scores.
|
To measure the overall steerability of the system, we repeat the process above for all personas and average the resulting percentile rank scores.
|
||||||
|
|
||||||
We show the preliminary results of running this evaluation framework on few-shot steerable systems - baseline systems that implement steering by including the steering observations in their system prompt formatted as "you are role-playing as a person that agrees with the following statements: \[agree observations] and disagrees with the following observations \[disagree observations]". We use the same few-shot prompt on GPT-4o Mini, Gemini 1.5 Flash and Claude 3.5 Sonnet.
|
We show the preliminary results of running this evaluation framework on few-shot steerable systems - baseline systems that implement steering by including the steering observations in their system prompt formatted as "you are role-playing as a person that agrees with the following statements: \[agree observations] and disagrees with the following observations \[disagree observations]". We use the same few-shot prompt on GPT-4o Mini, Gemini 1.5 Flash and Claude 3.5 Sonnet.
|
||||||
|
|
||||||
# 4. Results and Discussion
|
# 4. Results and Discussion
|
||||||
|
|
||||||
## Score Matrix Analysis
|
## Score Matrix Analysis
|
||||||
@ -1,29 +1,24 @@
|
|||||||
---
|
---
|
||||||
title: Introducing Neuromancer XR
|
title: Introducing Neuromancer XR
|
||||||
author: Dani Balcells
|
subtitle: Our Reasoning Model for State-Of-The-Art Memory
|
||||||
date: 08.18.2025
|
date: 08.18.25
|
||||||
tags:
|
tags:
|
||||||
- research
|
- research
|
||||||
- ml
|
- ml
|
||||||
- "#neuromancer"
|
- "#neuromancer"
|
||||||
subtitle: Our Reasoning Model for State-Of-The-Art Memory
|
author: Dani Balcells
|
||||||
|
description: Meet Neuromancer XR--our custom reasoning model that achieves state-of-the-art memory by extracting & scaffolding logical conclusions from conversations.
|
||||||
---
|
---
|
||||||
|
|
||||||
![[opengraph_neuromancer.png]]
|
![[opengraph_neuromancer.png]]
|
||||||
|
# TL;DR
|
||||||
## TL;DR
|
*Memory is a foundational pillar of social cognition. As a key component of [Honcho](https://honcho.dev), we approach it as a combined reasoning and retrieval problem. In this post, we introduce Neuromancer XR, the first in a series of custom reasoning models that works by extracting and scaffolding atomic conclusions from user messages across two strictly defined levels of logical certainty: explicit and deductive. It's the result of fine-tuning Qwen3-8B on a manually curated dataset mapping conversation turns to atomic conclusions. Using Neuromancer XR as the reasoning engine behind our core product Honcho leads to 86.9% accuracy on the [LoCoMo](https://snap-research.github.io/locomo/) benchmark, compared to 69.6% using the base Qwen3-8B model, and 80.0% when using Claude 4 Sonnet as baseline, to achieve state of the art results. The next model in the series, Neuromancer MR will extract and scaffold observations at two further levels along the spectrum of certainty: inductive and abductive. This will allow us to front-load most of the inference needed to improve LLMs' social cognition skills, powering AI-native products that truly understand any peer in a system, be it a user or an agent.*
|
||||||
_Memory is a foundational pillar of social cognition. As a key component of [Honcho](https://honcho.dev), we approach it as a combined reasoning and retrieval problem. In this post, we introduce Neuromancer XR, the first in a series of custom reasoning models that works by extracting and scaffolding atomic conclusions from user messages across two strictly defined levels of logical certainty: explicit and deductive. It's the result of fine-tuning Qwen3-8B on a manually curated dataset mapping conversation turns to atomic conclusions. Using Neuromancer XR as the reasoning engine behind our core product Honcho leads to 86.9% accuracy on the [LoCoMo](https://snap-research.github.io/locomo/) benchmark, compared to 69.6% using the base Qwen3-8B model, and 80.0% when using Claude 4 Sonnet as baseline, to achieve state of the art results. The next model in the series, Neuromancer MR will extract and scaffold observations at two further levels along the spectrum of certainty: inductive and abductive. This will allow us to front-load most of the inference needed to improve LLMs' social cognition skills, powering AI-native products that truly understand any peer in a system, be it a user or an agent._
|
# Table Stakes
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
At Plastic, we want to enable builders to create AI applications and agents with exceptional social intelligence: tools that are able to understand who you are and what you mean, whether it's an AI tutor that adapts to your learning style or a multi-agent system that anticipates your needs. These applications all require something fundamental that's only recently begun to draw attention: memory.
|
At Plastic, we want to enable builders to create AI applications and agents with exceptional social intelligence: tools that are able to understand who you are and what you mean, whether it's an AI tutor that adapts to your learning style or a multi-agent system that anticipates your needs. These applications all require something fundamental that's only recently begun to draw attention: memory.
|
||||||
|
|
||||||
Most approaches treat memory as an end product or top-level [[Memory as Reasoning#Memory is ~~Storage~~ Prediction|feature]], enabling information to persist across chatbot sessions, but we consider it the foundation of something much bigger: the ability for LLMs to build mental models of their users and one another and draw from those representations in real time. This capability is essential for personalization, engagement, and retention. Not to mention multi-agent systems, individual alignment, and the trust required for agentic behavior. It's the difference between an AI that merely responds to queries and one that genuinely understands and adapts to the person it's talking to; the difference between out-of-the-box experiences and ones cohered to a user’s personal identity
|
Most approaches treat memory as an end product or top-level [[Memory as Reasoning#Memory is ~~Storage~~ Prediction|feature]], enabling information to persist across chatbot sessions, but we consider it the foundation of something much bigger: the ability for LLMs to build mental models of their users and one another and draw from those representations in real time. This capability is essential for personalization, engagement, and retention. Not to mention multi-agent systems, individual alignment, and the trust required for agentic behavior. It's the difference between an AI that merely responds to queries and one that genuinely understands and adapts to the person it's talking to; the difference between out-of-the-box experiences and ones cohered to a user’s personal identity
|
||||||
|
|
||||||
To do anything approaching the social cognition required, Honcho must be state-of-the-art in memory: able to recall observations about users across conversations with superhuman fidelity. Today, we're sharing our approach and early results from training a specialized model that treats [[Memory as Reasoning|memory as a reasoning task]] rather than simple static storage.
|
To do anything approaching the social cognition required, Honcho must be state-of-the-art in memory: able to recall observations about users across conversations with superhuman fidelity. Today, we're sharing our approach and early results from training a specialized model that treats [[Memory as Reasoning|memory as a reasoning task]] rather than simple static storage.
|
||||||
|
|
||||||
# Memory as Reasoning
|
# Memory as Reasoning
|
||||||
|
|
||||||
Reasoning models continue to surge in capability and popularity. And with them, our approach to memory. Why not design it as a reasoning task concerned with deliberating over the optimal context to synthesize and remember? We turned to formal logic to develop four methods of reasoning, along a spectrum of certainty, toward conclusions to derive from conversational data:
|
Reasoning models continue to surge in capability and popularity. And with them, our approach to memory. Why not design it as a reasoning task concerned with deliberating over the optimal context to synthesize and remember? We turned to formal logic to develop four methods of reasoning, along a spectrum of certainty, toward conclusions to derive from conversational data:
|
||||||
|
|
||||||
- **Explicit**: Information directly stated by a participant.
|
- **Explicit**: Information directly stated by a participant.
|
||||||
@ -91,20 +86,17 @@ Reasoning models continue to surge in capability and popularity. And with them,
|
|||||||
> > > - Erin probably has a growth mindset (transformed health concern into athletic goal, combines activities like reading while running)
|
> > > - Erin probably has a growth mindset (transformed health concern into athletic goal, combines activities like reading while running)
|
||||||
|
|
||||||
Having clear definitions for these four types of reasoning and their corresponding levels of certainty also allows us to establish how different kinds of observations relate to one another. Specifically, we require observations to scaffold only on top of observations with higher certainty: an abduction (e.g. "Erin values her health proactively") can use a deduction (e.g. "Erin exercises regularly") or induction (e.g. "Erin prioritizes healthy eating during weekdays") as one of its premises, but not the other way around. That is, one can speculate given a certain conclusion, but one cannot attempt to conclude something logically from prediction. Implied in this is that the model must show its work. A conclusion must include its premises, its evidence and support.
|
Having clear definitions for these four types of reasoning and their corresponding levels of certainty also allows us to establish how different kinds of observations relate to one another. Specifically, we require observations to scaffold only on top of observations with higher certainty: an abduction (e.g. "Erin values her health proactively") can use a deduction (e.g. "Erin exercises regularly") or induction (e.g. "Erin prioritizes healthy eating during weekdays") as one of its premises, but not the other way around. That is, one can speculate given a certain conclusion, but one cannot attempt to conclude something logically from prediction. Implied in this is that the model must show its work. A conclusion must include its premises, its evidence and support.
|
||||||
|
|
||||||
# Neuromancer XR: Training a Logical Reasoning Specialist for Memory
|
# Neuromancer XR: Training a Logical Reasoning Specialist for Memory
|
||||||
|
|
||||||
To implement this vision, we need a model that can reliably extract and categorize conclusions from conversations. Our initial focus for the memory task, given its focus on factual recall, is on the first two certainty levels: explicit and deductive knowledge--that is, conclusions we know to be true given what users (or agents) state in their messages.
|
To implement this vision, we need a model that can reliably extract and categorize conclusions from conversations. Our initial focus for the memory task, given its focus on factual recall, is on the first two certainty levels: explicit and deductive knowledge--that is, conclusions we know to be true given what users (or agents) state in their messages.
|
||||||
|
|
||||||
We generated a proprietary dataset of approximately 10,000 manually curated instances of conclusion derivation, creating memory-reasoning traces from conversational data. Each instance shows how to process a conversation turn and derive the relevant conclusions at appropriate certainty levels. We then fine-tuned Qwen3-8B on these traces.
|
We generated a proprietary dataset of approximately 10,000 manually curated instances of conclusion derivation, creating memory-reasoning traces from conversational data. Each instance shows how to process a conversation turn and derive the relevant conclusions at appropriate certainty levels. We then fine-tuned Qwen3-8B on these traces.
|
||||||
|
|
||||||
The resulting model is Neuromancer XR (for eXplicit Reasoning), a model specialized in deriving explicit and deductive conclusions from conversational data. It is currently in production powering the latest release of [Honcho](https://www.honcho.dev).
|
The resulting model is Neuromancer XR (for eXplicit Reasoning), a model specialized in deriving explicit and deductive conclusions from conversational data. It is currently in production powering the latest release of [Honcho](https://www.honcho.dev).
|
||||||
|
|
||||||
## Integration with Honcho
|
## Integration with Honcho
|
||||||
![[neuromancer_honcho_diagram.png]]
|
![[neuromancer_honcho_diagram.png]]
|
||||||
*Figure 1. Diagram of the Honcho workflow.*
|
*Figure 1. Diagram of the Honcho workflow.*
|
||||||
|
|
||||||
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
|
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
|
||||||
# Evaluation
|
# Evaluation
|
||||||
Although the Honcho workflow allows us to answer any arbitrary question about a peer, from the purely factual to the predictive, it's important for us to be able to benchmark its raw memory abilities--how accurately it can recall factual information shared by a user in a conversation.
|
Although the Honcho workflow allows us to answer any arbitrary question about a peer, from the purely factual to the predictive, it's important for us to be able to benchmark its raw memory abilities--how accurately it can recall factual information shared by a user in a conversation.
|
||||||
|
|
||||||
@ -146,27 +138,21 @@ This can lead to poor embedding quality, making retrieval more difficult, or add
|
|||||||
|
|
||||||
|
|
||||||
We further speculate that deciding what information to extract for memory purposes from a conversation turn is something that small models are definitely capable of, as it's mostly a matter of identifying and correctly rephrasing information that's already present in the text and making small logical deductions based on it. This contrasts however, with the more complex tasks needed for AI-native memory and social cognition, hardly limited to abilities like inferring user intent or theory of mind, which require generating substantial amounts of information not present in the text itself.
|
We further speculate that deciding what information to extract for memory purposes from a conversation turn is something that small models are definitely capable of, as it's mostly a matter of identifying and correctly rephrasing information that's already present in the text and making small logical deductions based on it. This contrasts however, with the more complex tasks needed for AI-native memory and social cognition, hardly limited to abilities like inferring user intent or theory of mind, which require generating substantial amounts of information not present in the text itself.
|
||||||
|
|
||||||
# Directions for future work
|
# Directions for future work
|
||||||
We're training a model for the remaining two levels of logical certainty outlined above in our framework: inductive and abductive. The next model in the Neuromancer series, Neuromancer MR (for meta-reasoning), will be in charge of this.
|
We're training a model for the remaining two levels of logical certainty outlined above in our framework: inductive and abductive. The next model in the Neuromancer series, Neuromancer MR (for meta-reasoning), will be in charge of this.
|
||||||
|
|
||||||
This model will reason about reasoning, focusing on the predictive side of the certainty spectrum. It will allow us to derive likely explanations and probable hypotheses for broad patterns of user or agent behavior at the moment of ingestion, bolstering the density and utility of peer representations. We’re developing internal evaluations for this task, as none currently exist for this frontier of synthetic social cognition.
|
This model will reason about reasoning, focusing on the predictive side of the certainty spectrum. It will allow us to derive likely explanations and probable hypotheses for broad patterns of user or agent behavior at the moment of ingestion, bolstering the density and utility of peer representations. We’re developing internal evaluations for this task, as none currently exist for this frontier of synthetic social cognition.
|
||||||
## Front-loading social reasoning inference
|
## Front-loading social reasoning inference
|
||||||
|
|
||||||
One of the advantages of this memory framework is that it allows us to front-load a lot of the meta-cognitive inference that's required to improve LLMs' social intelligence and theory of mind capabilities. In our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|prior research]], as early as 2023, we show that allowing LLMs to reason over conversational data in a chain-of-thought style would allow them to develop high-fidelity models of users' mental states.
|
One of the advantages of this memory framework is that it allows us to front-load a lot of the meta-cognitive inference that's required to improve LLMs' social intelligence and theory of mind capabilities. In our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|prior research]], as early as 2023, we show that allowing LLMs to reason over conversational data in a chain-of-thought style would allow them to develop high-fidelity models of users' mental states.
|
||||||
|
|
||||||
Most other LLM frameworks store atomic, low-level "facts" about users and include them as context at generation time. This, in theory, and with enough carefully prompted inference-time compute, would allow a good enough model to develop abstract theories about the user's mental state as it tries to answer a query about the user. However, it would have to happen implicitly in the model's thought process, which in turn means that the theories about the user's mental state are ephemeral, opaque and unpredictable. Approaches such as this therefore are inconsistent and inefficient, and would further struggle to meet the challenges of true social cognition.
|
Most other LLM frameworks store atomic, low-level "facts" about users and include them as context at generation time. This, in theory, and with enough carefully prompted inference-time compute, would allow a good enough model to develop abstract theories about the user's mental state as it tries to answer a query about the user. However, it would have to happen implicitly in the model's thought process, which in turn means that the theories about the user's mental state are ephemeral, opaque and unpredictable. Approaches such as this therefore are inconsistent and inefficient, and would further struggle to meet the challenges of true social cognition.
|
||||||
|
|
||||||
Our approach, on the other hand, shifts most of the load of reasoning about the peer from generation time to the earlier stages of the process, when messages are processed and ingested. By the time observations are retrieved for generation, low-level messages have already been distilled and scaffolded into a hierarchical, certainty-labeled, and easy to navigate tree containing a high-fidelity user representation.
|
Our approach, on the other hand, shifts most of the load of reasoning about the peer from generation time to the earlier stages of the process, when messages are processed and ingested. By the time observations are retrieved for generation, low-level messages have already been distilled and scaffolded into a hierarchical, certainty-labeled, and easy to navigate tree containing a high-fidelity user representation.
|
||||||
|
|
||||||
|
|
||||||
## Beyond recall: toward social intelligence
|
## Beyond recall: toward social intelligence
|
||||||
|
|
||||||
Evaluations and benchmarks are essential tools on our path to develop better frameworks for the development of AI-native tools. However, they don't tell the whole story: no evaluation is perfect, and hill-climbing can easily mislead us into optimizing for higher scores rather than the true north star: the overall quality of our product. For us, that means treating memory not as a hill to die on, but as table-stakes in our pursuit of social cognition that can truly transform the way AI-native tools understand us. Although success at this broader goal is much harder to quantify in conventional benchmarks, given the complex and under-specified nature of social cognition, we will continue to implement the evaluations that we find the most helpful for our agile development process.
|
Evaluations and benchmarks are essential tools on our path to develop better frameworks for the development of AI-native tools. However, they don't tell the whole story: no evaluation is perfect, and hill-climbing can easily mislead us into optimizing for higher scores rather than the true north star: the overall quality of our product. For us, that means treating memory not as a hill to die on, but as table-stakes in our pursuit of social cognition that can truly transform the way AI-native tools understand us. Although success at this broader goal is much harder to quantify in conventional benchmarks, given the complex and under-specified nature of social cognition, we will continue to implement the evaluations that we find the most helpful for our agile development process.
|
||||||
|
|
||||||
In that spirit, we have our sights set on the remaining two levels of certainty we introduced at the beginning of this blog post: inductive and abductive. In our manual, preliminary testing, including all four levels of reasoning resulted in incredibly rich user representations being extracted from even the simplest interactions. What lies ahead of us is the exciting task of harnessing these representations and delivering them via Honcho in the fastest, most flexible and most agentic way.
|
In that spirit, we have our sights set on the remaining two levels of certainty we introduced at the beginning of this blog post: inductive and abductive. In our manual, preliminary testing, including all four levels of reasoning resulted in incredibly rich user representations being extracted from even the simplest interactions. What lies ahead of us is the exciting task of harnessing these representations and delivering them via Honcho in the fastest, most flexible and most agentic way.
|
||||||
|
# Some Notes on Model Naming
|
||||||
## Some Notes on Model Naming
|
|
||||||
>Personality is my medium.
|
>Personality is my medium.
|
||||||
|
|
||||||
        -*Neuromancer* (Gibson, 1984)
|
        -*Neuromancer* (Gibson, 1984)
|
||||||
@ -178,8 +164,6 @@ The character Neuromancer is an AI tasked with transmuting personal identity fro
|
|||||||
In many ways, this is analogous to Plastic's mission to create representations of personal identity of such high-fidelity that they asymptotically approach the full complexity of the original person. But more specifically, our Neuromancer models are tasked with reasoning about user (or agent) data to create and scaffold the atomic conclusions from which we build those representations.
|
In many ways, this is analogous to Plastic's mission to create representations of personal identity of such high-fidelity that they asymptotically approach the full complexity of the original person. But more specifically, our Neuromancer models are tasked with reasoning about user (or agent) data to create and scaffold the atomic conclusions from which we build those representations.
|
||||||
|
|
||||||
So not only does the name fit, but it also honors and strives toward the incredible ambition of Gibson's vision still yet to be realized 40 years later.
|
So not only does the name fit, but it also honors and strives toward the incredible ambition of Gibson's vision still yet to be realized 40 years later.
|
||||||
|
|
||||||
|
|
||||||
# Appendix A: LLM-as-judge design and prompt
|
# Appendix A: LLM-as-judge design and prompt
|
||||||
In our evaluation of the three models we tested, we used the standard GPT 4o-mini as an LLM-as-judge, using the prompt below, in order to label responses as correct or incorrect. This is a choice from several factors, which we outline below.
|
In our evaluation of the three models we tested, we used the standard GPT 4o-mini as an LLM-as-judge, using the prompt below, in order to label responses as correct or incorrect. This is a choice from several factors, which we outline below.
|
||||||
|
|
||||||
|
|||||||
@ -1,22 +1,18 @@
|
|||||||
---
|
---
|
||||||
title: "SPIRAL: Letting LLMs Teach Themselves Through Self-Play"
|
title: "SPIRAL: Letting LLMs Teach Themselves Through Self-Play"
|
||||||
author: Dani Balcells
|
date: 08.15.25
|
||||||
date: 08.15.24
|
|
||||||
tags:
|
tags:
|
||||||
- research
|
- research
|
||||||
- ml
|
- ml
|
||||||
- reinforcement
|
- rl
|
||||||
- learning
|
author: Dani Balcells
|
||||||
|
description: How self-play on text games develops generalizable reasoning skills in LLMs--achieving 8.6% math improvement from training on poker with no mathematical content.
|
||||||
---
|
---
|
||||||
|
|
||||||
![[selfplay.png]]
|
![[selfplay.png]]
|
||||||
*Source: [Liu, Guertler et al., 2025](https://arxiv.org/abs/2506.24119).*
|
*Source: [Liu, Guertler et al., 2025](https://arxiv.org/abs/2506.24119).*
|
||||||
|
# TL;DR
|
||||||
## TL;DR
|
|
||||||
_We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition._
|
_We collaborated with the TextArena team to develop SPIRAL, a novel RL framework that allows LLMs to develop complex reasoning capabilities by playing text-based games against themselves. Using SPIRAL on a simplified variant of poker with no mathematical content, a 4B-parameter Qwen model improved its performance on math and reasoning benchmarks by 8.6% and 8.4% respectively. It does this by learning specific strategies, such as case-by-case analysis and expected value calculation, that generalize beyond poker better than simple game heuristics. We're excited to explore whether self-play on social deduction games like Mafia can lead to general improvements in LLMs' social cognition._
|
||||||
|
# Teaching Social Cognition Through Games
|
||||||
---
|
|
||||||
## Teaching Social Cognition Through Games
|
|
||||||
At Plastic Labs, one of our key research interests is improving language models' social cognition: their ability to represent people's mental states, predict users' behaviors, and navigate complex social dynamics. This capability is essential for creating AI systems that can genuinely understand and adapt to individual users, yet it remains underdeveloped compared to technical abilities and so-called "hard skills" like reasoning and coding.
|
At Plastic Labs, one of our key research interests is improving language models' social cognition: their ability to represent people's mental states, predict users' behaviors, and navigate complex social dynamics. This capability is essential for creating AI systems that can genuinely understand and adapt to individual users, yet it remains underdeveloped compared to technical abilities and so-called "hard skills" like reasoning and coding.
|
||||||
|
|
||||||
Complex skills like social cognition present unique challenges for conventional supervised learning, arguably the dominant paradigm in machine learning, where models are given labeled examples of correct behavior. Unlike conventional language modeling tasks such as question answering or translation, social understanding involves nuanced judgments about beliefs, intentions, and interpersonal dynamics. With social reasoning, on the other hand, creating comprehensive labeled datasets of correct behavior is not just expensive, but often an ill-posed and under-specified problem, given how hard it is to define what the right answer should be in the first place.
|
Complex skills like social cognition present unique challenges for conventional supervised learning, arguably the dominant paradigm in machine learning, where models are given labeled examples of correct behavior. Unlike conventional language modeling tasks such as question answering or translation, social understanding involves nuanced judgments about beliefs, intentions, and interpersonal dynamics. With social reasoning, on the other hand, creating comprehensive labeled datasets of correct behavior is not just expensive, but often an ill-posed and under-specified problem, given how hard it is to define what the right answer should be in the first place.
|
||||||
@ -28,9 +24,7 @@ These approaches have primarily focused on domains with verifiable answers: math
|
|||||||
Our research soon connected us with [Leon Guertler](https://x.com/leonguertler) and the [TextArena](https://www.textarena.ai) team, who were working on a Python library designed for this exact purpose: providing text-only games as RL environments in the hopes that they might allow LLMs to acquire general skills. We soon discovered we were kindred spirits working on similar problems, and decided to collaborate.
|
Our research soon connected us with [Leon Guertler](https://x.com/leonguertler) and the [TextArena](https://www.textarena.ai) team, who were working on a Python library designed for this exact purpose: providing text-only games as RL environments in the hopes that they might allow LLMs to acquire general skills. We soon discovered we were kindred spirits working on similar problems, and decided to collaborate.
|
||||||
|
|
||||||
This blog post introduces the first result of that collaboration: SPIRAL, a framework that allows LLMs to develop complex reasoning skills by playing text-based games against themselves.
|
This blog post introduces the first result of that collaboration: SPIRAL, a framework that allows LLMs to develop complex reasoning skills by playing text-based games against themselves.
|
||||||
|
# SPIRAL's Key Contributions
|
||||||
## SPIRAL's Key Contributions
|
|
||||||
|
|
||||||
The [SPIRAL paper](https://arxiv.org/abs/2506.24119) demonstrates that self-play on simple games can develop generalizable reasoning skills without any domain-specific training data. The experiments consisted of training Qwen3-4B-Base on Kuhn Poker—a minimal three-card poker variant—for just 400 training steps. Despite the game containing no mathematical content whatsoever, this training improved the model's performance on math benchmarks by 8.6% and general reasoning by 8.4%. Perhaps most surprisingly, the self-play approach outperformed a baseline trained using supervised fine-tuning on 25,000 expert game trajectories, suggesting that the competitive dynamics of self-play provide a more effective learning signal than imitation learning.
|
The [SPIRAL paper](https://arxiv.org/abs/2506.24119) demonstrates that self-play on simple games can develop generalizable reasoning skills without any domain-specific training data. The experiments consisted of training Qwen3-4B-Base on Kuhn Poker—a minimal three-card poker variant—for just 400 training steps. Despite the game containing no mathematical content whatsoever, this training improved the model's performance on math benchmarks by 8.6% and general reasoning by 8.4%. Perhaps most surprisingly, the self-play approach outperformed a baseline trained using supervised fine-tuning on 25,000 expert game trajectories, suggesting that the competitive dynamics of self-play provide a more effective learning signal than imitation learning.
|
||||||
|
|
||||||
Self-play creates fundamentally different training dynamics than conventional approaches. When a model plays against continuously updating copies of itself, it faces an opponent that evolves in lockstep with its own improvements. This prevents the static exploitation patterns that emerge when training against fixed opponents: in the paper, we find that models trained against unchanging opponents like Mistral or Gemini initially struggle, then plateau once they discover winning exploits. Furthermore, given the zero-sum nature of the games, self-play forces models to develop genuine strategic reasoning that remains robust against an ever-adapting adversary.
|
Self-play creates fundamentally different training dynamics than conventional approaches. When a model plays against continuously updating copies of itself, it faces an opponent that evolves in lockstep with its own improvements. This prevents the static exploitation patterns that emerge when training against fixed opponents: in the paper, we find that models trained against unchanging opponents like Mistral or Gemini initially struggle, then plateau once they discover winning exploits. Furthermore, given the zero-sum nature of the games, self-play forces models to develop genuine strategic reasoning that remains robust against an ever-adapting adversary.
|
||||||
@ -42,9 +36,7 @@ What makes it possible for the skills learned through SPIRAL to generalize beyon
|
|||||||
- Pattern recognition, helping the model identify recurring structures and regularities, such as recognizing when an opponent's betting pattern signals strength.
|
- Pattern recognition, helping the model identify recurring structures and regularities, such as recognizing when an opponent's betting pattern signals strength.
|
||||||
|
|
||||||
The main technical innovation that enabled stable self-play training was Role-conditioned Advantage Estimation (RAE). It is designed to mitigate the effects of variance, a common challenge in multi-agent reinforcement learning. Facing a constantly changing opponent makes it difficult to determine whether a given positive reward should be attributed to good play or to a mistake by an opponent, which in turn makes model updates unreliable and unstable. RAE addresses this by maintaining separate baselines for each role in the game, normalizing rewards relative to the expected performance in each specific role. Without RAE, the training often led to "thinking collapse", where gradients become unstable and eventually drop to near zero, halting learning and resulting in nonsensical outputs.
|
The main technical innovation that enabled stable self-play training was Role-conditioned Advantage Estimation (RAE). It is designed to mitigate the effects of variance, a common challenge in multi-agent reinforcement learning. Facing a constantly changing opponent makes it difficult to determine whether a given positive reward should be attributed to good play or to a mistake by an opponent, which in turn makes model updates unreliable and unstable. RAE addresses this by maintaining separate baselines for each role in the game, normalizing rewards relative to the expected performance in each specific role. Without RAE, the training often led to "thinking collapse", where gradients become unstable and eventually drop to near zero, halting learning and resulting in nonsensical outputs.
|
||||||
|
# Next Steps for Social Intelligence
|
||||||
## Next Steps for Social Intelligence
|
|
||||||
|
|
||||||
For Plastic Labs, SPIRAL is a first step pointing us in an intriguing direction: competitive self-play as an effective way to teach models complex skills without domain-specific supervision. It opens the door for us to explore using similar approaches to teach models social cognition specifically.
|
For Plastic Labs, SPIRAL is a first step pointing us in an intriguing direction: competitive self-play as an effective way to teach models complex skills without domain-specific supervision. It opens the door for us to explore using similar approaches to teach models social cognition specifically.
|
||||||
|
|
||||||
We’re currently exploring whether social deduction games like Mafia, Avalon and Werewolf are the natural next step for this approach. They require exactly the capabilities we want models to develop: maintaining accurate models of multiple agents' mental states simultaneously, detecting deception through subtle behavioral cues, building trust strategically, and managing the flow of information to achieve goals. Success in these games depends on genuine social understanding, precisely the core components of social cognition that remain underdeveloped in current language models.
|
We’re currently exploring whether social deduction games like Mafia, Avalon and Werewolf are the natural next step for this approach. They require exactly the capabilities we want models to develop: maintaining accurate models of multiple agents' mental states simultaneously, detecting deception through subtle behavioral cues, building trust strategically, and managing the flow of information to achieve goals. Success in these games depends on genuine social understanding, precisely the core components of social cognition that remain underdeveloped in current language models.
|
||||||
@ -5,10 +5,10 @@ tags:
|
|||||||
- research
|
- research
|
||||||
- ml
|
- ml
|
||||||
- philosophy
|
- philosophy
|
||||||
|
author: Courtland Leer, Vince Trost, & Vineeth Voruganti
|
||||||
|
description: Research showing how predictive coding-inspired metacognitive prompting enhances LLM theory of mind abilities & reduces prediction error about users.
|
||||||
---
|
---
|
||||||
[Read on Arxiv](https://arxiv.org/abs/2310.06983).
|
[Read on Arxiv](https://arxiv.org/abs/2310.06983).
|
||||||
|
|
||||||
Or download here:
|
|
||||||
|
|
||||||
<iframe style="width: 100%;height: 50vh" src="https://arxiv.org/pdf/2310.06983.pdf"></iframe>
|
<iframe style="width: 100%;height: 50vh" src="https://arxiv.org/pdf/2310.06983.pdf"></iframe>
|
||||||
|
|
||||||
|
|||||||
@ -18,6 +18,10 @@ $desktop: "(min-width: #{map.get($breakpoints, desktop)})";
|
|||||||
|
|
||||||
$pageWidth: #{map.get($breakpoints, mobile)};
|
$pageWidth: #{map.get($breakpoints, mobile)};
|
||||||
$sidePanelWidth: 320px; //380px;
|
$sidePanelWidth: 320px; //380px;
|
||||||
|
/* $pageWidth: 750px; */
|
||||||
|
/* $mobileBreakpoint: 600px; */
|
||||||
|
/* $tabletBreakpoint: 1000px; */
|
||||||
|
/* $sidePanelWidth: 308px; */
|
||||||
$topSpacing: 6rem;
|
$topSpacing: 6rem;
|
||||||
$boldWeight: 700;
|
$boldWeight: 700;
|
||||||
$semiBoldWeight: 600;
|
$semiBoldWeight: 600;
|
||||||
|
|||||||
118
warp.md
Normal file
118
warp.md
Normal file
@ -0,0 +1,118 @@
|
|||||||
|
# Plastic Labs Blog
|
||||||
|
|
||||||
|
This is the Plastic Labs blog, built with Quartz v4 - a static site generator for publishing digital gardens and notes.
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
- **Framework**: Quartz v4 (built on top of Markdown processing with unified/remark/rehype)
|
||||||
|
- **Content Location**: `content/` directory
|
||||||
|
- `blog/` - Blog posts
|
||||||
|
- `research/` - Research content
|
||||||
|
- `extrusions/` - Extrusions content
|
||||||
|
- `notes/` - Notes
|
||||||
|
- `careers/` - Career-related content
|
||||||
|
- `releases/` - Release announcements
|
||||||
|
- **Static Assets**: `static/` directory (copied to public root during build)
|
||||||
|
- **Configuration**: `quartz.config.ts`
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Node.js >= 18.14
|
||||||
|
- npm >= 9.3.1
|
||||||
|
|
||||||
|
## Common Commands
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
### Development
|
||||||
|
```bash
|
||||||
|
# Build and serve the site locally
|
||||||
|
npx quartz build --serve
|
||||||
|
|
||||||
|
# Build and serve docs specifically
|
||||||
|
npm run docs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Code Quality
|
||||||
|
```bash
|
||||||
|
# Type check
|
||||||
|
npm run check
|
||||||
|
|
||||||
|
# Format code
|
||||||
|
npm run format
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
npm run test
|
||||||
|
```
|
||||||
|
|
||||||
|
### Git Workflow
|
||||||
|
```bash
|
||||||
|
# Check current branch
|
||||||
|
git branch
|
||||||
|
|
||||||
|
# Create new branch
|
||||||
|
git checkout -b your-branch-name
|
||||||
|
|
||||||
|
# Check status
|
||||||
|
git status
|
||||||
|
|
||||||
|
# Stage changes
|
||||||
|
git add .
|
||||||
|
|
||||||
|
# Commit changes
|
||||||
|
git commit -m "your message"
|
||||||
|
|
||||||
|
# Push to remote
|
||||||
|
git push origin your-branch-name
|
||||||
|
|
||||||
|
# Pull latest changes
|
||||||
|
git pull origin branch-name
|
||||||
|
|
||||||
|
# Pull with rebase (recommended when you have local commits)
|
||||||
|
git pull --rebase origin branch-name
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The site is configured via `quartz.config.ts`:
|
||||||
|
- **Site Title**: 🥽 Plastic Labs
|
||||||
|
- **Base URL**: blog.plasticlabs.ai
|
||||||
|
- **Theme**: Custom dark/light mode with Departure Mono headers and Roboto Mono body
|
||||||
|
- **Analytics**: PostHog
|
||||||
|
- **Ignored Patterns**: `private/`, `templates/`
|
||||||
|
|
||||||
|
## Custom Features
|
||||||
|
|
||||||
|
- Custom static file copying plugin (CopyStatic)
|
||||||
|
- OpenGraph images with default `/og-image.png`
|
||||||
|
- RSS feed and sitemap generation
|
||||||
|
- SPA navigation enabled
|
||||||
|
- Popovers enabled
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
The site uses Docker for deployment (see `Dockerfile`).
|
||||||
|
|
||||||
|
## Branch Structure
|
||||||
|
|
||||||
|
- `v4` - Main production branch
|
||||||
|
- Feature branches follow pattern: `username/feature-name`
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Push Rejected
|
||||||
|
If you get "rejected - fetch first" errors:
|
||||||
|
1. Pull with rebase to preserve your local commits: `git pull --rebase origin branch-name`
|
||||||
|
2. Then push: `git push origin branch-name`
|
||||||
|
|
||||||
|
### Dependencies Not Found
|
||||||
|
Run `npm install` to ensure all dependencies are installed.
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Quartz Documentation](https://quartz.jzhao.xyz/)
|
||||||
|
- [Discord Community](https://discord.gg/cRFFHYye7t)
|
||||||
Loading…
Reference in New Issue
Block a user