mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-19 10:54:06 -06:00
some callout drafts for archived posts
This commit is contained in:
parent
7cf0f4545a
commit
f4d854ae78
@ -1,27 +1,27 @@
|
||||
---
|
||||
title: "Comprehensive Analysis of Design Patterns for REST API SDKs"
|
||||
title: "ARCHIVED: A Comprehensive Analysis of Design Patterns for REST API SDKs"
|
||||
date: 05.09.2024
|
||||
tags: ["blog", "dev"]
|
||||
author: "Vineeth Voruganti"
|
||||
tags:
|
||||
- blog
|
||||
- dev
|
||||
author: Vineeth Voruganti
|
||||
---
|
||||
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||
>
|
||||
> This post contains Vineeth's (Plastic Co-founder/CTO) notes on the early design of Honcho's SDKs. For the most up to date SDK reference, check out the [Honcho Docs](https://docs.honcho.dev).
|
||||
>
|
||||
> Enjoy.
|
||||
|
||||
This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)
|
||||
and written by [Vineeth Voruganti](https://github.com/VVoruganti)
|
||||
|
||||
## TL;DR
|
||||
|
||||
*This post is adapted from [vineeth.io](https://vineeth.io/posts/sdk-development)*
|
||||
# TL;DR
|
||||
After several months of managing the SDKs for Honcho manually, we decided to
|
||||
take a look at the options available for automatically generating SDKs.
|
||||
|
||||
From our research we picked a platform and have made brand new SDKs for Honcho
|
||||
that use idiomatic code, are well documented, and let us support more languages.
|
||||
|
||||
---
|
||||
|
||||
For the past few months I have been working on managing the
|
||||
[Honcho](https://honcho.dev) project and its associated SDKs. We've been taking
|
||||
the approach of developing the SDK manually as we are focused on trying to find
|
||||
the best developer UX and maximize developer delight.
|
||||
# Introduction
|
||||
For the past few months I have been working on managing the [Honcho](https://honcho.dev) project and its associated SDKs. We've been taking the approach of developing the SDK manually as we are focused on trying to find the best developer UX and maximize developer delight.
|
||||
|
||||
This has led to a rather arduous effort that has required a large amount of
|
||||
refactoring as we are making new additions to the project, and the capabilities
|
||||
@ -30,20 +30,15 @@ of the platform rapidly expand.
|
||||
While these efforts have been going on a new player in the SDK generation space
|
||||
dropped on [hacker news](https://news.ycombinator.com/item?id=40146505).
|
||||
|
||||
When I first started working on **Honcho** I did a cursory look at a number of SDK
|
||||
When I first started working on Honcho I did a cursory look at a number of SDK
|
||||
generators, but wasn't impressed with the results I saw. However, a lot of that
|
||||
was speculative and Honcho was not nearly as mature as it is now.
|
||||
|
||||
So spurred by the positive comments in the thread above I've decided to do a
|
||||
more detailed look into the space and, also try to develop a better understanding
|
||||
of what approaches are generally favorable in creating API client libraries.
|
||||
|
||||
## Background
|
||||
|
||||
For a full understanding of Honcho I recommend the great [[A Simple Honcho
|
||||
Primer|Simple Honcho
|
||||
Primer]] post, but I'll
|
||||
try to summarize the important details here.
|
||||
# Background
|
||||
For a full understanding of Honcho I recommend the great [[ARCHIVED; A Simple Honcho Primer|Simple Honcho Primer]] post, but I'll try to summarize the important details here.
|
||||
|
||||
Honcho is a personalization platform for LLM applications. It is infrastructure
|
||||
that developers can use for storing data related to their applications, deriving
|
||||
@ -82,9 +77,7 @@ session = user.create_session()
|
||||
|
||||
There is an Async version of the SDK with an `AsyncHoncho` class that uses
|
||||
objects such as `AsyncSession` and `AsyncUser`.
|
||||
|
||||
## Guiding Questions
|
||||
|
||||
# Guiding Questions
|
||||
Before evaluating the below platforms I wanted to investigate a few questions I
|
||||
had about how to design SDKs and how they are generally maintained in other
|
||||
organizations. I've also included some questions I want to think about when
|
||||
@ -107,8 +100,7 @@ Platform Specific Questions
|
||||
3. How easy was it to use the tool?
|
||||
4. What approach does the tool take? Object-oriented or singleton?
|
||||
5. How does it handle async vs sync interfaces?
|
||||
|
||||
## Research
|
||||
# Research
|
||||
|
||||
> First I took a look at sources and posts onlines that talk in general about
|
||||
> developing SDKs. This isn't an exhaustive look at every link I looked at, but
|
||||
@ -305,9 +297,7 @@ Most people seem to be saying a full OOP method is overkill, but there are
|
||||
people advocating for having a controller class with methods that take data
|
||||
objects as inputs. Essentially advocating for the singleton approach with data
|
||||
only objects.
|
||||
|
||||
### Analysis
|
||||
|
||||
## Analysis
|
||||
Many of the generic concerns of SDK design do not have to do with the UX of the
|
||||
SDK for the end developer, rather background processes that an SDK handle. This
|
||||
includes:
|
||||
@ -339,8 +329,7 @@ but the object-oriented approach may not be a readable, and it could be unclear
|
||||
what methods are doing in complex codebases. Even GPT-4 couldn't decide between
|
||||
the two.
|
||||
|
||||

|
||||

|
||||
|
||||
Again and again, the best way to approach SDK development is to just do whatever
|
||||
is easier, and create tons of documentation that will help developers navigate
|
||||
@ -348,9 +337,7 @@ your [API Ladder](https://blog.sbensu.com/posts/apis-as-ladders/). Someone will
|
||||
get confused regardless of what you do, so the key is to make sure the SDK makes
|
||||
sense (even if it's not the most efficient or clean) and remove hurdles for
|
||||
users to navigate errors and mistakes.
|
||||
|
||||
## SDK Generation Platforms
|
||||
|
||||
# SDK Generation Platforms
|
||||
With a sense of the best standards for SDK design and additional features that
|
||||
should be supported in the SDK I want to look at a few different options to
|
||||
determine what is the best solution to go with.
|
||||
@ -364,9 +351,7 @@ Below is a list of the different platforms I wanted to review
|
||||
|
||||
I was using the OpenAPI Spec for Honcho that was housed at
|
||||
https://demo.honcho.dev/openapi.json.
|
||||
|
||||
### Stainless
|
||||
|
||||
## Stainless
|
||||
Since the hacker news thread for the release of stainless is what spurred this
|
||||
research I decided to try them out first.
|
||||
|
||||
@ -381,8 +366,7 @@ of the interface. There was also built-in capabilities for retries, pagination,
|
||||
and auth.
|
||||
|
||||
There's also capability for adding custom code such as utility functions.
|
||||
|
||||
### Speakeasy
|
||||
## Speakeasy
|
||||
|
||||
Speakeasy required me to do everything locally through their `brew` package. It
|
||||
did not immediately accept the OpenAPI Spec and required me to make some tweaks.
|
||||
@ -397,9 +381,7 @@ The generated SDK didn't feel as strong as the stainless one. There didn't seem
|
||||
to support `async` methods, it did not use `pydantic` and used the built-in
|
||||
Python `@dataclass`. The methods had really unwieldy names, and looked like it
|
||||
would need a lot of tweaking to get it more production ready.
|
||||
|
||||
### Liblab
|
||||
|
||||
## Liblab
|
||||
Also had me do the generation from the cli using their npm package. It was
|
||||
pretty straightforward to login and give it an API spec. Liblab seems to require
|
||||
a lot tweaking to get better results. It gave me several warnings asking me to
|
||||
@ -414,8 +396,7 @@ which seems to be the industry standard for codegen tools. The method names
|
||||
were also unwieldy. It also didn't make use of pydantic and instead implemented
|
||||
its own `BaseModel` class. It was built on the `requests` model and doesn't seem
|
||||
to support `async` methods.
|
||||
|
||||
### OpenAPI Generator
|
||||
## OpenAPI Generator
|
||||
|
||||
This is the only one on the list that is not expressly backed by a company
|
||||
whose main goal is SDK generation. It is however a very popular project with
|
||||
@ -435,9 +416,7 @@ Once again, the sdk use the `singleton` approach.
|
||||
|
||||
I also did not see any indication of functionality for retry logic,
|
||||
authentication, or pagination.
|
||||
|
||||
### Conclusion
|
||||
|
||||
## Conclusion
|
||||
Overall, Stainless had the results that I liked the most. With almost no work
|
||||
from me, it produced a high quality SDK that designed things in a sensible way
|
||||
with many built-in features such as retries, pagination, and auth.
|
||||
@ -459,9 +438,7 @@ What I'm looking for right now is the platform or tool that can reduce my work
|
||||
the most and let me focus on other things and stainless achieved that. The
|
||||
results are not perfect, but it doesn't look like it'll need more than some
|
||||
slight tweaking and testing to get to a state I want.
|
||||
|
||||
## Results
|
||||
|
||||
# Results
|
||||
After reaching the conclusion in the previous section, I took some time to fully
|
||||
implement Stainless to make SDKs for Honcho and am proud to announce the release
|
||||
of a new Python SDK, and the launch of a brand-new NodeJS SDK.
|
||||
@ -1,18 +1,18 @@
|
||||
---
|
||||
title: A Simple Honcho Primer
|
||||
title: "ARCHIVED: A Simple Honcho Primer"
|
||||
date: 04.16.24
|
||||
tags:
|
||||
- blog
|
||||
- honcho
|
||||
---
|
||||
> [!custom] WELCOME TO THE PLASTIC ARCHIVE WIP
|
||||
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||
> This post has been archived because it's legacy content that
|
||||
>
|
||||
|
||||
![[bot reading primer.png]]
|
||||
|
||||
> [!NOTE] Welcome to our quick, "explain it like I'm 5" guide to [Honcho](https://honcho.dev)!
|
||||
> We'll keep it simple, covering [[A Simple Honcho Primer#^ef795f|what Honcho is]], [[A Simple Honcho Primer#^x125da|why we built it]], [[A Simple Honcho Primer#^cd2d3c|how to use it]], and [[A Simple Honcho Primer#^ca46d7|where the product is going]]. But throughout, we'll link to places you can dive deeper.
|
||||
> We'll keep it simple, covering [[ARCHIVED; A Simple Honcho Primer#^ef795f|what Honcho is]], [[ARCHIVED; A Simple Honcho Primer#^x125da|why we built it]], [[ARCHIVED; A Simple Honcho Primer#^cd2d3c|how to use it]], and [[ARCHIVED; A Simple Honcho Primer#^ca46d7|where the product is going]]. But throughout, we'll link to places you can dive deeper.
|
||||
|
||||
## What Is Honcho?
|
||||
^ef795f
|
||||
@ -21,26 +21,26 @@ Honcho is a personalization platform for large language model (LLM) applications
|
||||
|
||||
It's software infrastructure that lets AI apps "get to know" their users, resulting in delightful experiences and optimized time to value.
|
||||
|
||||
We'll have direct consumer experiences in the future, but today, the product is for application developers. It allows them to [[Introducing Honcho's Dialectic API#^a14c2f|reduce overhead]] and [[Introducing Honcho's Dialectic API#^x7f7f8|enhance their machine learning pipeline]].
|
||||
We'll have direct consumer experiences in the future, but today, the product is for application developers. It allows them to [[ARCHIVED; Introducing Honcho's Dialectic API#^a14c2f|reduce overhead]] and [[ARCHIVED; Introducing Honcho's Dialectic API#^x7f7f8|enhance their machine learning pipeline]].
|
||||
|
||||
Right now, Honcho is in private beta, that means integrating our hosted version requires permission and onboarding[^1]. [You can sign-up here](https://plasticlabs.typeform.com/honchobeta).
|
||||
|
||||
In its current form, Honcho has three core components:
|
||||
|
||||
1. [[Announcing Honcho's Private Beta#^x15f37|Storage]] - managing each user's data & inference about each user
|
||||
2. [[Announcing Honcho's Private Beta#^x53717|Insights]] - processing user data with our proprietary AI models
|
||||
3. [[Announcing Honcho's Private Beta#^ee4516|Retrieval]] - surfacing user data to personalize user experience (UX)
|
||||
1. [[ARCHIVED; Announcing Honcho's Private Beta#^x15f37|Storage]] - managing each user's data & inference about each user
|
||||
2. [[ARCHIVED; Announcing Honcho's Private Beta#^x53717|Insights]] - processing user data with our proprietary AI models
|
||||
3. [[ARCHIVED; Announcing Honcho's Private Beta#^ee4516|Retrieval]] - surfacing user data to personalize user experience (UX)
|
||||
|
||||
If you've heard of [Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation) (RAG), this might sound familiar. But Honcho is doing *much* more than simple RAG.
|
||||
|
||||
Behind the scenes, Honcho learns about users as people--[[User State is State of the Art|richly modeling identity]]. It seeks to understand their beliefs, hopes, dreams, history, interests, and preferences.
|
||||
Behind the scenes, Honcho learns about users as people--[[ARCHIVED; User State is State of the Art|richly modeling identity]]. It seeks to understand their beliefs, hopes, dreams, history, interests, and preferences.
|
||||
|
||||
It then acts as [[Introducing Honcho's Dialectic API|an oracle to each user]], allowing apps to ask for any personal context they need to improve UX and giving them access to a social cognition layer.
|
||||
It then acts as [[ARCHIVED; Introducing Honcho's Dialectic API|an oracle to each user]], allowing apps to ask for any personal context they need to improve UX and giving them access to a social cognition layer.
|
||||
|
||||
## Why We Built Honcho
|
||||
^x125da
|
||||
|
||||
Plastic Labs was founded as an edtech company. The original mission was to build an AI tutor that [[Open Sourcing Tutor-GPT#^x527dc|could reason like]] the best human instructors. We quickly found the key limitation was data not on the subject matter, but on the student. To overcome it, the tutor needed [[Theory of Mind Is All You Need|a way to]] get to know *each* of its students deeply.
|
||||
Plastic Labs was founded as an edtech company. The original mission was to build an AI tutor that [[ARCHIVED; Open Sourcing Tutor-GPT#^x527dc|could reason like]] the best human instructors. We quickly found the key limitation was data not on the subject matter, but on the student. To overcome it, the tutor needed [[ARCHIVED; Theory of Mind Is All You Need|a way to]] get to know *each* of its students deeply.
|
||||
|
||||
Honcho was born by running up against this challenge, building technology to solve it, and realizing all AI applications are going to need the same solutions. The promise of *generative* AI isn't one-size-fits-all products, but bespoke experiences in each moment for each user. The same limitation emerges--how well do you know your user?
|
||||
|
||||
@ -57,7 +57,7 @@ But it's not intuitive for a few reasons:
|
||||
|
||||
Still, when interacting with an AI app, there's a sense that it *should* be getting to know us. In fact, we're often surprised when we realize it's not learning about us over time. And probably annoyed at having to start over.
|
||||
|
||||
Think about personalization here as more like the experience of close human companionship or white glove services than the attention hacking mechanisms of TikTok. There's [[Announcing Honcho's Private Beta#^xb6ef1|enormous potenial]] for more positive-sum use of user data and for aligning AI applications more closely with user needs and preferences[^2].
|
||||
Think about personalization here as more like the experience of close human companionship or white glove services than the attention hacking mechanisms of TikTok. There's [[ARCHIVED; Announcing Honcho's Private Beta#^xb6ef1|enormous potenial]] for more positive-sum use of user data and for aligning AI applications more closely with user needs and preferences[^2].
|
||||
|
||||
## How to Use Honcho
|
||||
^cd2d3c
|
||||
@ -78,7 +78,7 @@ But what about vectorDBs? Don't worry, Honcho has you covered there too. You can
|
||||
collection.create_document(content="The user is interested in AI")
|
||||
```
|
||||
|
||||
Using Honcho as a storage mechanism allows you to **retrieve** rich insights via the user profiles it's building and managing on the backend. Your application's LLM can access [[Loose theory of mind imputations are superior to verbatim response predictions|theory-of-mind]] inference over those profiles via the *[[Introducing Honcho's Dialectic API|dialectic]]* API.
|
||||
Using Honcho as a storage mechanism allows you to **retrieve** rich insights via the user profiles it's building and managing on the backend. Your application's LLM can access [[Loose theory of mind imputations are superior to verbatim response predictions|theory-of-mind]] inference over those profiles via the *[[ARCHIVED; Introducing Honcho's Dialectic API|dialectic]]* API.
|
||||
|
||||
It's simple: just query in natural language using the `session.chat()` method:
|
||||
|
||||
@ -86,16 +86,16 @@ It's simple: just query in natural language using the `session.chat()` method:
|
||||
session.chat("What are the user's interests?")
|
||||
```
|
||||
|
||||
There are a [[Introducing Honcho's Dialectic API#How It Works|ton of ways]] to use Honcho, this primer only scratches the surface[^3].
|
||||
There are a [[ARCHIVED; Introducing Honcho's Dialectic API#How It Works|ton of ways]] to use Honcho, this primer only scratches the surface[^3].
|
||||
|
||||
## What's Next for Honcho?
|
||||
^ca46d7
|
||||
|
||||
Beyond improving our internal AI models so they can get to know users as richly as possible, we see three natural extensions in [[Announcing Honcho's Private Beta#^eb15f3|Honcho's future]]:
|
||||
Beyond improving our internal AI models so they can get to know users as richly as possible, we see three natural extensions in [[ARCHIVED; Announcing Honcho's Private Beta#^eb15f3|Honcho's future]]:
|
||||
|
||||
1. [[Announcing Honcho's Private Beta#^x2dd3b|Monitoring & Evaluation]] - developer tools to understand & assess the impact of personalization + machine learning tools to build personalized datasets
|
||||
2. [[Announcing Honcho's Private Beta#^a84f44|User-Facing Controls]] - chat with *your* Honcho to direct how it manages & shares data + authenticate with Honcho to sign-in to AI apps
|
||||
3. [[Announcing Honcho's Private Beta#^ebf071|Honcho Application Ecosystem]] - a network of apps contributing to & sharing Honcho data, user-owned & stored in confidential environments
|
||||
1. [[ARCHIVED; Announcing Honcho's Private Beta#^x2dd3b|Monitoring & Evaluation]] - developer tools to understand & assess the impact of personalization + machine learning tools to build personalized datasets
|
||||
2. [[ARCHIVED; Announcing Honcho's Private Beta#^a84f44|User-Facing Controls]] - chat with *your* Honcho to direct how it manages & shares data + authenticate with Honcho to sign-in to AI apps
|
||||
3. [[ARCHIVED; Announcing Honcho's Private Beta#^ebf071|Honcho Application Ecosystem]] - a network of apps contributing to & sharing Honcho data, user-owned & stored in confidential environments
|
||||
|
||||
And in just a few weeks, we'll be launching a demo platform where anyone can interact with (& eventually build) Honcho powered apps.
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Announcing Honcho's Private Beta
|
||||
title: "ARCHIVED: Announcing Honcho's Private Beta"
|
||||
date: 04.01.24
|
||||
tags:
|
||||
- announcements
|
||||
@ -12,7 +12,7 @@ tags:
|
||||
|
||||
Today we're announcing the launch of [Honcho's](https://honcho.dev) private beta. [Sign-up for the waitlist here](https://plasticlabs.typeform.com/honchobeta).
|
||||
|
||||
This is a hosted version of our agent personalization platform. It integrates user data storage and theory of mind inference accessible via [[Introducing Honcho's Dialectic API|our Dialectic API]]. You can now inject per-user social cognition anywhere in your AI app's architecture.
|
||||
This is a hosted version of our agent personalization platform. It integrates user data storage and theory of mind inference accessible via [[ARCHIVED; Introducing Honcho's Dialectic API|our Dialectic API]]. You can now inject per-user social cognition anywhere in your AI app's architecture.
|
||||
|
||||
## The Problem
|
||||
|
||||
@ -24,7 +24,7 @@ Setting up a per-user storage framework to manage identities at scale *and* know
|
||||
|
||||
It's a lot. And trust us, the rabbit hole goes way deeper than that. We obsess over it.
|
||||
|
||||
So it's understandable that most projects haven't begun to tackle it. Hell, most haven't even hit this failure mode yet. [[Theory of Mind Is All You Need|We have]].
|
||||
So it's understandable that most projects haven't begun to tackle it. Hell, most haven't even hit this failure mode yet. [[ARCHIVED; Theory of Mind Is All You Need|We have]].
|
||||
|
||||
At once, the problem of personalization in AI apps offers both one of the greatest paradigm shifting opportunities and one of the largest challenges. We're solving it so you don't have to.
|
||||
|
||||
@ -63,11 +63,11 @@ Honcho is always updating user identity, so it's ready when you need it.
|
||||
##### Dialectic API
|
||||
^ee4516
|
||||
|
||||
Our [[Introducing Honcho's Dialectic API|Dialectic API]] is how your app-side LLM interfaces with the Honcho-side agent sitting on top of each user identity. This is done in natural language. It's an AI-native endpoint for direct LLM-to-LLM communication.
|
||||
Our [[ARCHIVED; Introducing Honcho's Dialectic API|Dialectic API]] is how your app-side LLM interfaces with the Honcho-side agent sitting on top of each user identity. This is done in natural language. It's an AI-native endpoint for direct LLM-to-LLM communication.
|
||||
|
||||
It allows you to inject personal context and social cognition directly into your app's cognitive architecture wherever you need it, sync or async. Agent-to-agent chat over each user.
|
||||
|
||||
[[Introducing Honcho's Dialectic API#^57acc3|Here's an extended list of possible ways to use it]].
|
||||
[[ARCHIVED; Introducing Honcho's Dialectic API#^57acc3|Here's an extended list of possible ways to use it]].
|
||||
|
||||
#### User-Specific Monitoring (coming soon...)
|
||||
^x2dd3b
|
||||
@ -80,7 +80,7 @@ Soon, Honcho will support a suite of tools to get the most out of our personaliz
|
||||
|
||||
- **Evaluation & Benchmarking** - the state of theory of mind research is highly compelling, but [[Achieving SOTA on OpenToM with DSPy#^0b4f2e|we need practical, app & user specific evals]]
|
||||
|
||||
- **Training Set Curation** - building datasets with personal context [[Introducing Honcho's Dialectic API#^f19646|allows more robust, domain-specific training]], we're building tools for anyone to easily construct then train on
|
||||
- **Training Set Curation** - building datasets with personal context [[ARCHIVED; Introducing Honcho's Dialectic API#^f19646|allows more robust, domain-specific training]], we're building tools for anyone to easily construct then train on
|
||||
|
||||
### The Future of Honcho
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: "Honcho: User Context Management for LLM Apps"
|
||||
title: "ARCHIVED: Honcho: User Context Management for LLM Apps"
|
||||
enableToc: true
|
||||
date: 01.18.24
|
||||
tags:
|
||||
@ -31,7 +31,7 @@ So we set out to build a non-skeuomorphic, AI-native tutor that put users first.
|
||||
![[teacher_shoggoth.png]]
|
||||
*We're not so different after all ([@anthrupad](https://twitter.com/anthrupad)).*
|
||||
|
||||
Our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
|
||||
Our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free--precisely because we built [cognitive architectures](https://blog.langchain.dev/openais-bet-on-a-cognitive-architecture/) that mimic the theory-of-mind expertise of highly efficacious 1:1 instructors.
|
||||
|
||||
## Context Failure Mode
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Introducing Honcho's Dialectic API
|
||||
title: "ARCHIVED: Introducing Honcho's Dialectic API"
|
||||
date: 03.26.24
|
||||
tags:
|
||||
- dev
|
||||
@ -16,11 +16,11 @@ Agents need ways to interface dynamically and autonomously, free from the rigidn
|
||||
|
||||
## What's a Dialectic API?
|
||||
|
||||
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
|
||||
[Honcho](https://honcho.dev) is our platform for personalizing agents to users. Currently, it includes [[ARCHIVED; Honcho; User Context Management for LLM Apps#^a9d0f8|session storage]], BYO context storage, passive [[Loose theory of mind imputations are superior to verbatim response predictions|theory of mind]] user modeling, and *now* an agent deeply coupled to all of that rich user context. That agent can be called via our Dialectic API to surface user data for use with any cognitive architecture.
|
||||
|
||||
### How It Works
|
||||
|
||||
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[User State is State of the Art|each specific user]].
|
||||
In designing an LLM pipeline and an application's cognitive architecture, you'll need to decide where and how to inject personal user context so the task is [[Machine learning is fixated on task performance|not simply completed in a general way]], but in the most appropriate way for [[ARCHIVED; User State is State of the Art|each specific user]].
|
||||
|
||||
That's when your agent asks Honcho for what it needs in natural language. This query can take many forms. Some possibilities:
|
||||
|
||||
@ -70,7 +70,7 @@ Extra context improves user response generation, the more specific, the better.
|
||||
|
||||
##### Leverage Natural Language Plasticity
|
||||
|
||||
Each user has a [[User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of you app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
|
||||
Each user has a [[ARCHIVED; User State is State of the Art#^5bc20b|rich and complex personal identity]]. Access to higher-fidelity representations of that identity can be combined with the task completion context of you app in each moment to generate the most optimal tokens for each user-agent interaction. I.e. ones that are felt by the user to be [[Humans like personalization|more personalized and satisfactory]]--enhancing the real and perceived time to value ratio of your app.
|
||||
|
||||
But that complexity is hard to capture and needlessly constrained with typical API design. In order to express the nuance of personal context, we need the high variance, dynamic nature of natural language.
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Memories for All
|
||||
title: "ARCHIVED: Memories for All"
|
||||
date: 02.15.24
|
||||
tags:
|
||||
- blog
|
||||
@ -49,7 +49,7 @@ Right now, the vast majority of software UX is a 1-to-many experience. What you
|
||||
|
||||
AI apps can deal *generatively* with each user on an individual basis, that is, an experience can be produced ad hoc for every user upon every interaction. From 1:many to 1:1 without prohibitive sacrifices in efficiency. But we're still underestimating the full scope of possibility here.
|
||||
|
||||
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
|
||||
As it stands today the space is mostly focused on the (albeit generative) [[Machine learning is fixated on task performance|1:many tasks LLMs can perform]]. The apps remain more or less stateless with regard to the user. To reach 1:1 nirvana, we need more [[ARCHIVED; Honcho; User Context Management for LLM Apps|user-centric agent design]]. We need frameworks, mechanisms, services, models dedicated to deep coherence with user identity.
|
||||
|
||||
Every agent interaction can be generated just in time for every person, informed by relevant personal context more substantive than human-to-human sessions. User context will enable disposable agents on the fly across verticals for lower marginal cost than 1:many software paradigms.
|
||||
|
||||
@ -103,7 +103,7 @@ Check out our [LangChain implementation](https://docs.honcho.dev/how-to/personal
|
||||
|
||||
Where things get powerful is in the aggregate. What resolves is a highly insightful picture of who your users are and what they need--a key context reservoir to improve the qualitative and quantitative experience.
|
||||
|
||||
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
|
||||
N.b. you can certainly direct the model with as much verbosity as you like, but we've found during extensive experimentation that [[ARCHIVED; Theory of Mind Is All You Need|the more you trust the model]] the better and more useful the results.
|
||||
|
||||
This isn't surprising when you consider how much content about what people are thinking is contained in a model's pretraining. It's led to some really exciting [emergent abilities](https://arxiv.org/abs/2302.02083).
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Open-Sourcing Tutor-GPT
|
||||
title: "ARCHIVED: Open-Sourcing Tutor-GPT"
|
||||
date: 06.02.2023
|
||||
tags:
|
||||
- blog
|
||||
@ -8,20 +8,29 @@ tags:
|
||||
- pedagogy
|
||||
- ml
|
||||
---
|
||||
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||
>
|
||||
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom for now to focus exclusively on Honcho.
|
||||
>
|
||||
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first of its kind, personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
|
||||
>
|
||||
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
|
||||
>
|
||||
> We wrote the following at the very beginning of that transition. It details the benefits of early efforts at model *reasoning* to enhance personalization, architecture that would later inspire Honcho, & the massive space overhung LLM capabilities we were researching--all quite a bit ahead of its time.
|
||||
>
|
||||
> Enjoy.
|
||||
|
||||
![[assets/human_machine_learning.jpeg]]
|
||||
|
||||
## TL;DR
|
||||
|
||||
# TL;DR
|
||||
Today we’re [open-sourcing](https://github.com/plastic-labs/tutor-gpt) Bloom, our digital [Aristotelian](https://erikhoel.substack.com/p/why-we-stopped-making-einsteins) learning companion.
|
||||
|
||||
What makes [Bloom](https://bloombot.ai/) compelling is its ability to _reason pedagogically_ about the learner. That is, it uses dialogue to posit the most educationally-optimal tutoring behavior. Eliciting this from the [capability overhang](https://jack-clark.net/2023/03/21/import-ai-321-open-source-gpt3-giving-away-democracy-to-agi-companies-gpt-4-is-a-political-artifact/) involves multiple chains of [metaprompting](https://arxiv.org/pdf/2102.07350.pdf,) enabling Bloom to construct a nascent, academic [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) for each student. ^3498b7
|
||||
|
||||
We’re not seeing this in the explosion of ‘chat-over-content’ tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models don’t achieve the necessary user intimacy. Infrastructure for that doesn’t exist yet 👀.
|
||||
We’re now seeing this in the explosion of ‘chat-over-content’ tools, most of which fail to capitalize on the enormous latent abilities of LLMs. Even the impressive out-of-the-box capabilities of contemporary models don’t achieve the necessary user intimacy. Infrastructure for that doesn’t exist yet 👀.
|
||||
|
||||
Our mission is to facilitate personal, [agentic](https://arxiv.org/pdf/2304.03442.pdf) AI for all. So to that end, we’re (1) releasing Bloom’s architecture into the wild and (2) embarking on a journey to supercharge the kind of empowering generative agents we want to see in the world.
|
||||
|
||||
## Neo-Aristotelian Tutoring
|
||||
|
||||
# Neo-Aristotelian Tutoring
|
||||
Right now, Bloom is a reading comprehension and writing workshop tutor. You can chat with it in [Discord](https://discord.gg/bloombotai). After supplying it a passage, Bloom can coach you toward understanding or revising a piece of text. It does this by treating the user as an equal, prompting and challenging socratically.
|
||||
|
||||
We started with reading and writing in natural language because (1) native language acumen is the symbolic system through which all other fluencies are learned, (2) critical dialogue is the ideal vehicle by which to do this, and (3) that's what LLMs are best at right now.
|
||||
@ -35,10 +44,8 @@ Current compute suggests we can do high-grade 1:1 for two orders of magnitude ch
|
||||
It's clear generative AI stands a good chance of democratizing this kind of access and attention, but what's less clear are the specifics. It's tough to be an effective teacher that students actually want to learn from. Harder still to let the student guide the experience, yet maintain an elevated discourse.
|
||||
|
||||
So how do we create successful learning agents that students will eagerly use without coercion? We think this ability lies latent in foundation models, but the key is eliciting it.
|
||||
|
||||
## Eliciting Pedagogical Reasoning
|
||||
# Eliciting Pedagogical Reasoning
|
||||
^x527dc
|
||||
|
||||
The machine learning community has long sought to uncover the full range of tasks that large language models can be prompted to accomplish on general pre-training alone (the capability overhang). We believe we have discovered one such task: pedagogical reasoning. ^05bfd8
|
||||
|
||||
Bloom was built and prompted to elicit this specific type of teaching behavior. (The kind laborious for new teachers, but that adept ones learn to do unconsciously.) After each input it revises a user’s real-time academic needs, considers all the information at its disposal, and suggests to itself a framework for constructing the ideal response. ^285105
|
||||
@ -73,9 +80,7 @@ Notice how Bloom reasons it should indulge the topic, validate the student, and
|
||||
Aside from these edgier cases, Bloom shines helping students understand difficult passages (from syntactic to conceptual levels) and giving writing feedback (especially competent at thesis construction). [Take it for a spin](https://discord.gg/udtxycbh).
|
||||
|
||||
Ultimately, we hope [open-sourcing Bloom](https://github.com/plastic-labs/tutor-gpt#readme) will allow anyone to run with these elicitations and prompt to expand utility and support multiple domains. We’ll be doing work here too.
|
||||
|
||||
## Bloom & Agentic AI
|
||||
|
||||
# Bloom & Agentic AI
|
||||
This constitutes the beginning of an approach far superior to just slapping a chatbot UI over a content library that's probably already in the foundation model's pre-training.
|
||||
|
||||
After all, if it were just about content delivery, MOOCs would've solved education. We need more than that to reliably grow rare minds. And we're already seeing Bloom excel at promoting synthesis and creative interpretation within its narrow utility.
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Solving The Campfire Problem with Honcho
|
||||
title: "ARCHIVED: Solving The Campfire Problem with Honcho"
|
||||
date: 03.14.2024
|
||||
tags:
|
||||
- demos
|
||||
@ -12,7 +12,7 @@ tags:
|
||||
|
||||
Today we're releasing the first demo utilizing Honcho's dialectic API.[^1] Your LLM app/agent can now converse freely with [Honcho](https://honcho.dev)(-as-agent) about a user in natural language: agent-to-agent chat over user context.
|
||||
|
||||
The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.
|
||||
The demo is a "curation buddy" that can chat over links you share. It uses Honcho to [[ARCHIVED; Memories for All|derive and store personal context]] about you over time, then leverages that to be the best reading companion it can be.
|
||||
|
||||
Our fractured media landscape is a far cry from narrative meaning making around the tribal campfire. Despite the connective power of the web, many of us subsist in lonely intellectual silos, more diverse but less fulfilling than social discourse.
|
||||
|
||||
@ -28,7 +28,7 @@ Enter *Curation Buddy*.
|
||||
|
||||
Curation Buddy is an LLM application. It's a Discord bot you can chat with. Share links to any text based media and have substantive conversation.
|
||||
|
||||
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[User State is State of the Art|state]], etc.
|
||||
It uses Honcho to personalize the UX. As you converse, Honcho learns about you. It reasons about the links and conversation to uncover insight into your knowledge, interests, beliefs, desires, [[ARCHIVED; User State is State of the Art|state]], etc.
|
||||
|
||||
This account of user state can then be leveraged by Curation Buddy to behave like a trusted, close intellectual companion.
|
||||
|
||||
@ -60,7 +60,7 @@ We'd love to see someone run with and extend this demo. Here are some further Ho
|
||||
- Construct and maintain full fledged user knowledge graphs
|
||||
- Automatic bespoke summaries of links informed by graph
|
||||
|
||||
- Use Honcho to create training examples for [[User State is State of the Art|user-specific curation models]]
|
||||
- Use Honcho to create training examples for [[ARCHIVED; User State is State of the Art|user-specific curation models]]
|
||||
|
||||
- Autonomously generated user newsletters to supplement conversations async
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Theory-of-Mind Is All You Need
|
||||
title: "ARCHIVED: Theory-of-Mind Is All You Need"
|
||||
date: 06.12.2023
|
||||
tags:
|
||||
- blog
|
||||
@ -7,16 +7,25 @@ tags:
|
||||
- bloom
|
||||
- pedagogy
|
||||
---
|
||||
> [!custom] WELCOME TO THE PLASTIC [[archive|ARCHIVE]]
|
||||
> This blog post has been archived because it's legacy content that's out-of-date or deprecated. We keep this content around so those interested can dig into the evolution of our projects & thinking.
|
||||
>
|
||||
> This post concerns Bloom, our [Honcho](https://honcho.dev)-powered AI-tutor. We've suspended Bloom for now to focus exclusively on Honcho.
|
||||
>
|
||||
> Plastic started as an EdTech company, with Bloom as its main product. In building a popular, first of its kind, personalized AI tutor, we realized three things (1) all agents will soon need continuous learning systems to understand their users, (2) this an extremely hard problem that every developer shouldn't have to redundantly solve, & (3) we were uniquely positioned to solve it.
|
||||
>
|
||||
> So we pivoted to Honcho, keeping Bloom around for a while as a demo.
|
||||
>
|
||||
> We wrote the following at the very beginning of that transition. The content here gets into the emergent LLM theory of mind capabilities we were exploring at the time, agentic auto-prompting, and the positive effects of personalizing agents--all quite a bit ahead of it's time.
|
||||
>
|
||||
> Enjoy.
|
||||
## TL;DR
|
||||
|
||||
Today we’re releasing a major upgrade to [Bloom](https://discord.gg/bloombot.ai) (& the open-source codebase, [tutor-gpt](https://github.com/plastic-labs/tutor-gpt)).
|
||||
|
||||
We gave our tutor even more autonomy to reason about the psychology of the user, and—using GPT-4 to dynamically _rewrite its own_ system prompts—we’re able to dramatically expand the scope of what Bloom can do _and_ massively reduce our prompting architecture.
|
||||
|
||||
We leaned into theory of mind experiments and Bloom is now more than just a literacy tutor, it’s an expansive learning companion.
|
||||
|
||||
## Satisfying Objective Discovery
|
||||
|
||||
Bloom is already excellent at helping you draft and understand language. But we want it do whatever you need.
|
||||
|
||||
To expand functionality though, we faced a difficult technical problem: figuring out what the learner wants to do.
|
||||
@ -34,16 +43,14 @@ The key here is they don’t have all the information—they _don’t know_ what
|
||||
Well we know that (1) foundation models are [shockingly good](https://arxiv.org/abs/2304.11490) at [theory of mind](https://en.wikipedia.org/wiki/Theory_of_mind), (2) Bloom already excels at [pedagogical reasoning](https://twitter.com/courtlandleer/status/1664673210007449605?s=20), and (3) [autonomous agents](https://twitter.com/yoheinakajima/status/1642881722495954945?s=20) are [having early success](https://twitter.com/Auto_GPT/status/1649370049688354816?s=20), so what if we stopped trying to deterministically prescribe an indeterminant intelligence?
|
||||
|
||||
What if we treated Bloom with some intellectual respect? ^67d75d
|
||||
|
||||
## Autonomous Prompting
|
||||
|
||||
The solution here is scary simple. The results are scary good.
|
||||
|
||||
[[Open Sourcing Tutor-GPT#^285105|Here’s a description]] of the previous version’s architecture:
|
||||
[[ARCHIVED; Open Sourcing Tutor-GPT#^285105|Here’s a description]] of the previous version’s architecture:
|
||||
|
||||
![[Open Sourcing Tutor-GPT#^285105]]
|
||||
![[Open Sourcing Tutor-GPT#^1e01f2]]
|
||||
![[Open Sourcing Tutor-GPT#^b1794d]]
|
||||
![[ARCHIVED; Open Sourcing Tutor-GPT#^285105]]
|
||||
![[ARCHIVED; Open Sourcing Tutor-GPT#^1e01f2]]
|
||||
![[ARCHIVED; Open Sourcing Tutor-GPT#^b1794d]]
|
||||
|
||||
Instead, we’ve now repurposed the ***thought*** chain to do two things:
|
||||
|
||||
@ -53,9 +60,7 @@ Instead, we’ve now repurposed the ***thought*** chain to do two things:
|
||||
![[assets/ToM Flow.png]]
|
||||
|
||||
Then we inject that generation into the body of the response chain’s system prompt. We do this with every user input. Instead of just reasoning about the learner’s intellectual/academic needs, Bloom now proactively rewrites itself to be as in-tune as possible to the learner at every step of the journey.
|
||||
|
||||
## Emergent Effects
|
||||
|
||||
We’re seeing substantial positive behavior changes as a result of giving Bloom this kind of autonomy.
|
||||
|
||||
![[assets/ToM Discord 1.png]]
|
||||
@ -71,9 +76,7 @@ And Bloom is game. It’ll go down a rabbit hole with you, help you strategize a
|
||||
While reducing the prompt material, we took to opportunity to remove basically all references to “tutor,” “student,” etc. We found that since Bloom is no longer contaminated by pointing at [certain averaged narratives in its pre-training](https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post)—e.g. the (bankrupt) contemporary conception of what a tutor is ‘supposed’ to be—it is, ironically, a better one.
|
||||
|
||||
Instead of simulating a tutor, it simulates _you_.
|
||||
|
||||
## Coming Soon...
|
||||
|
||||
All this begs the question: what could Bloom do with even better theory of mind? And how can we facilitate that?
|
||||
|
||||
What could other AI applications do with a framework like this?
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: User State is State of the Art
|
||||
title: "ARCHIVED: User State is State of the Art"
|
||||
date: 02.23.2024
|
||||
tags:
|
||||
- blog
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: YouSim Launches Identity Simulation on X
|
||||
title: "ARCHIVED: YouSim Launches Identity Simulation on X"
|
||||
date: 11.08.2024
|
||||
tags:
|
||||
- yousim
|
||||
@ -17,7 +17,7 @@ GM, simulants.
|
||||
|
||||
In response to popular demand, today we're imbuing the [@YouSimDotAI](https://x.com/YouSimDotAI) Twitter account with the ability to simulate identities natively on X.
|
||||
|
||||
Keep reading for max context, or [[YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].
|
||||
Keep reading for max context, or [[ARCHIVED; YouSim Launches Identity Simulation on X#^393e71|jump ahead to learn how to get started]].
|
||||
|
||||
## Caught in the Memetic Hurricane
|
||||
|
||||
@ -25,7 +25,7 @@ The [full story](https://x.com/courtlandleer/status/1849592301472919986) deserve
|
||||
|
||||
An anonymous actor launched a pump.fun token inspired by a demo called [YouSim](https://yousim.ai) we created a few months ago[^1]. [[YouSim; Explore The Multiverse of Identity|YouSim is a CLI interface game]] that lets you simulate any identity you can dream up--real or fictional, local or xeno, entity or artifact.
|
||||
|
||||
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
|
||||
We originally launched YouSim as a conceptual/narrative demo for our core product [Honcho](https://honcho.dev). Honcho [[ARCHIVED; A Simple Honcho Primer|helps AI applications improve UX]] by building representations of user identity they can leverage to create better products and experiences.
|
||||
|
||||
The mission is to become the identity layer for the rapidly approaching agentic world.
|
||||
|
||||
@ -124,7 +124,7 @@ with users and one another, and it still suffered from the fundamental problem
|
||||
of only supporting single-player experiences.
|
||||
|
||||
After launching [[YouSim;-Explore-The-Multiverse-of-Identity|YouSim]], and the
|
||||
explosion of [[YouSim Launches Identity Simulation on X|agents on Twitter]] it
|
||||
explosion of [[ARCHIVED; YouSim Launches Identity Simulation on X|agents on Twitter]] it
|
||||
became very clear that Honcho should not be limited to modeling human
|
||||
psychology, but rather could map the identity of any entity, human or AI. We
|
||||
were suffering from the human-assistant model and built a solution around that.
|
||||
|
||||
@ -37,7 +37,7 @@ It's the most powerful personal identity and social cognition solution for AI ap
|
||||
|
||||
Honcho is a cloud-based API that enables more personalized and contextually aware user experiences. It simplifies the process of maintaining context across conversations and interactions, allowing developers to create more responsive and customized agents without managing complex infrastructure.
|
||||
|
||||
Honcho combines flexible memory, [[Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
|
||||
Honcho combines flexible memory, [[ARCHIVED; Theory of Mind Is All You Need|theory of mind]] inference, self-improving user representations, and a [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic API]] to get your application the context it needs about each user for every inference.
|
||||
|
||||
All this happens ambiently, with no additional overhead to your users--no surveys, no hard coded questions, no BYO data requirements needed to get started. Honcho learns about each of your users in the background as they interact with your application.
|
||||
|
||||
|
||||
@ -22,7 +22,7 @@ Who will you summon?
|
||||
|
||||
Large language models are [simulators](https://www.astralcodexten.com/p/janus-simulators).
|
||||
|
||||
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
|
||||
And [Plastic's](https://plasticlabs.ai) core mission is to enable AI that can simulate you, can model and align to you, and therefore be trusted to act autonomously on your behalf. We're [[ARCHIVED; Announcing Honcho's Private Beta|starting]] that journey by building [Honcho](https://honcho.dev)--self-improving user memory for AI apps. It [[Humans like personalization|personalizes]] their UX and reduces user and developer overhead across the board. ^7a39cb
|
||||
|
||||
All this is possible because the LLM training corpus [[LLMs excel at theory of mind because they read|is packed]] with humans thinking about other humans. It holds close to everything we collectively know about human identity. Not only that, but all our other language and concepts and their possible combinations and permutations.
|
||||
|
||||
@ -65,11 +65,11 @@ Enjoy surfing the multiverse of identities...
|
||||
([Sign-up for updates here](https://plasticlabs.typeform.com/yousimupdates))
|
||||
## Honcho
|
||||
|
||||
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[User State is State of the Art|your ever changing identity]]. ^25b167
|
||||
If LLMs can simulate infinite identities, then they're uniquely suited to simulate *you*. You in any moment, setting, frame of mind contained in the complexity that is [[ARCHIVED; User State is State of the Art|your ever changing identity]]. ^25b167
|
||||
|
||||
If you're building an AI app, that's the level of personalization now possible. But you've got your vertical specific tasks to focus on, going down this clearly wacky identity rabbit hole to would be redundant and inefficient.
|
||||
|
||||
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
|
||||
Join >100 projects already on the [private beta waitlist](https://plasticlabs.typeform.com/honchobeta) for [[ARCHIVED; Announcing Honcho's Private Beta|Honcho's self-improving user memory]].
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -6,9 +6,9 @@ tags:
|
||||
---
|
||||
## 2023 Recap
|
||||
|
||||
Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
|
||||
Last year was wild. We started as an edtech company and ended as anything but. There's a deep dive on some of the conceptual lore in last week's "[[ARCHIVED; Honcho; User Context Management for LLM Apps#^09f185|Honcho: User Context Management for LLM Apps]]:"
|
||||
|
||||
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
|
||||
>[Plastic Labs](https://plasticlabs.ai) was conceived as a research group exploring the intersection of education and emerging technology...with the advent of ChatGPT...we shifted our focus to large language models...we set out to build a non-skeuomorphic, AI-native tutor that put users first...our [[ARCHIVED; Open Sourcing Tutor-GPT|experimental tutor]], Bloom, [[ARCHIVED; Theory of Mind Is All You Need|was remarkably effective]]--for thousands of users during the 9 months we hosted it for free...
|
||||
|
||||
Building a production-grade, user-centric AI application, then giving it nascent [theory of mind](https://arxiv.org/pdf/2304.11490.pdf) and [[LLM Metacognition is inference about inference|metacognition]], made it glaringly obvious to us that social cognition in LLMs was both under-explored and under-leveraged.
|
||||
|
||||
|
||||
@ -19,7 +19,7 @@ For months before, Plastic had been deep into the weeds around harvesting, retri
|
||||
|
||||
As you interface with the entire constellation of AI applications, you shouldn't have to redundantly provide context and oversight for every interaction. You need a single source of truth that can do this for you. You need a Local Honcho.
|
||||
|
||||
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
|
||||
But as we've discovered, LLMs are remarkable at theory of mind tasks, and thus at reasoning about user need. So unlike in the book, this administration can be offloaded to an AI. And your [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]] can orchestrate the relevant context and identities on your behalf, whatever the operation.
|
||||
|
||||
[^1]: "American English, from [Japanese](https://en.wikipedia.org/wiki/Japanese_language)_[班長](https://en.wiktionary.org/wiki/%E7%8F%AD%E9%95%B7#Japanese)_ (hanchō, “squad leader”)...probably entered English during World War II: many apocryphal stories describe American soldiers hearing Japanese prisoners-of-war refer to their lieutenants as _[hanchō](https://en.wiktionary.org/wiki/hanch%C5%8D#Japanese)_" ([Wiktionary](https://en.wiktionary.org/wiki/honcho))
|
||||
|
||||
|
||||
@ -27,7 +27,7 @@ The more we're missing that, the more we're typically in a principal-agent probl
|
||||
|
||||
But, right now, most AI applications are just toys and demos:
|
||||
|
||||
![[Honcho; User Context Management for LLM Apps#^18066b]]
|
||||
![[ARCHIVED; Honcho; User Context Management for LLM Apps#^18066b]]
|
||||
|
||||
It's also why everyone is obsessed with evals and benchmarks that have scant practical utility in terms of improving UX for the end user. If we had more examples of good products, ones people loved, killer apps, no one would care about leaderboards anymore.
|
||||
|
||||
|
||||
@ -3,11 +3,11 @@ title: Loose theory of mind imputations are superior to verbatim response predic
|
||||
date: 02.20.24
|
||||
---
|
||||
|
||||
When we [[Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
|
||||
When we [[ARCHIVED; Theory of Mind Is All You Need|first started experimenting]] with user context, we naturally wanted to test whether our LLM apps were learning useful things about users. And also naturally, we did so by making predictions about them.
|
||||
|
||||
Since we were operating in a conversational chat paradigm, our first instinct was to try and predict what the user would say next. Two things were immediately apparent: (1) this was really hard, & (2) response predictions weren't very useful.
|
||||
|
||||
We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
|
||||
We saw some remarkable exceptions, but _reliable_ verbatim prediction requires a level of context about the user that simply isn't available right now. We're not sure if it will require context gathering wearables, BMIs, or the network of context sharing apps we're building with [[ARCHIVED; Honcho; User Context Management for LLM Apps|Honcho]], but we're not there yet.
|
||||
|
||||
Being good at what any person in general might plausibly say is literally what LLMs do. But being perfect at what one individual will say in a singular specific setting is a whole different story. Even lifelong human partners might only experience this a few times a week.
|
||||
|
||||
|
||||
@ -3,7 +3,7 @@ title: Machine learning is fixated on task performance
|
||||
date: 12.12.23
|
||||
---
|
||||
|
||||
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
|
||||
The machine learning industry has traditionally adopted an academic approach, focusing primarily on performance across a range of tasks. LLMs like GPT-4 are a testament to this, having been scaled up to demonstrate impressive & diverse task capability. This scaling has also led to [[ARCHIVED; Theory of Mind Is All You Need|emergent abilities]], debates about the true nature of which rage on.
|
||||
|
||||
However, general capability doesn't necessarily translate to completing tasks as an individual user would prefer. This is a failure mode that anyone building agents will inevitably encounter. The focus, therefore, needs to shift from how language models perform tasks in a general sense to how they perform tasks on a user-specific basis.
|
||||
|
||||
|
||||
@ -25,11 +25,11 @@ Today we seem to be in a much different memetic geography--fraught with [epistem
|
||||
|
||||
### (Neuro)Skeuomorphism
|
||||
|
||||
Thinking LLM-natively has always been a struggle. All our collective [[Memories for All#^0e869d|priors about software]] tell us to [[Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
|
||||
Thinking LLM-natively has always been a struggle. All our collective [[ARCHIVED; Memories for All#^0e869d|priors about software]] tell us to [[ARCHIVED; Honcho; User Context Management for LLM Apps#^dfae31|prompt deterministically]], [[Machine learning is fixated on task performance|perfect tasks]], [[Loose theory of mind imputations are superior to verbatim response predictions|predict exactly]], make it safe, or mire any interesting findings in semantic debate. But in the process we beat the ghost out of the shell.
|
||||
|
||||
Rather than assume the [[Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
|
||||
Rather than assume the [[ARCHIVED; Open Sourcing Tutor-GPT#^3498b7|capability overhang]] exhausted (or view it as a failure mode or forget it exists), [Plastic's](https://plasticlabs.ai) belief is we haven't even scratched the surface. Further, we're convinced this is the veil behind which huddle the truly novel applications.
|
||||
|
||||
Core here is the assertion that what's happening in language model training and inference is more [[User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
|
||||
Core here is the assertion that what's happening in language model training and inference is more [[ARCHIVED; User State is State of the Art#^a93afc|like processes described in cognitive science]] than traditional computer science. More, they're [multidimensional and interobjective](https://en.wikipedia.org/wiki/Timothy_Morton#Hyperobjects) in ways that are hard to grok.
|
||||
|
||||
### Respect = Trust = Agency
|
||||
|
||||
@ -37,7 +37,7 @@ The solution is embrace and not handicap [[Loose theory of mind imputations are
|
||||
|
||||
First admit that though poorly understood, LLMs have [[LLMs excel at theory of mind because they read|impressive]] cognitive [[LLM Metacognition is inference about inference|abilities]]. Then, imbue them with [meta-methods](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) by which to explore that potential. Finally, your respect and trust may be rewarded with [something approaching agentic](https://youtu.be/tTE3xiHw4Js?feature=shared).
|
||||
|
||||
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
|
||||
Plastic's specific project in this direction is [Honcho](https://honcho.dev), a framework that [[ARCHIVED; User State is State of the Art#^5394b6|trusts the LLM to model user identity]] so that you can trust your apps to extend your agency.
|
||||
|
||||
<div class="tweet-wrapper"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">honcho exists to maximize the dissipation of your agency</p>— Courtland Leer (@courtlandleer) <a href="https://twitter.com/courtlandleer/status/1759324580664000617?ref_src=twsrc%5Etfw">February 18, 2024</a></blockquote>
|
||||
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
|
||||
|
||||
@ -6,7 +6,7 @@ tags:
|
||||
- ml
|
||||
- cogsci
|
||||
---
|
||||
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[User State is State of the Art|vaster than most imagine]].
|
||||
While large language models are exceptional at [imputing a startling](https://arxiv.org/pdf/2310.07298v1) amount from very little user data--an efficiency putting AdTech to shame--the limit here is [[ARCHIVED; User State is State of the Art|vaster than most imagine]].
|
||||
|
||||
Contrast recommender algorithms (which are impressive!) needing mountains of activity data to back into a single preference with [the human connectome](https://www.science.org/doi/10.1126/science.adk4858) containing 1400 TB of compressed representation in one cubic millimeter.
|
||||
|
||||
|
||||
@ -46,7 +46,7 @@ In the ToM space, there is really only one prompting technique that has shown im
|
||||
|
||||
## Experiments with DSPy
|
||||
|
||||
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[User State is State of the Art#^461ac9 |learn]] the prompts and examples to optimize performance on this benchmark?"
|
||||
What makes the DSPy package interesting is the ability to abstract away the underlying prompts and examples if the task and metric are well defined. Anecdotally, we believe that LLMs are [[ARCHIVED; Theory of Mind Is All You Need|quite good]] at the psychological modeling the OpenToM authors suggest they "fall short" on. So we asked ourselves, "what if we could [[ARCHIVED; User State is State of the Art#^461ac9|learn]] the prompts and examples to optimize performance on this benchmark?"
|
||||
|
||||
This task is relatively easy to define in DSPy terms: `(context, question -> answer)`. This [guide](https://dspy-docs.vercel.app/docs/tutorials/simplified-baleen#optimizing-the-pipeline) was helpful in crafting our modules which can be found [here](https://github.com/plastic-labs/dspy-opentom/blob/main/cot.py). The authors of the OpenToM paper also released extensive [evaluation code](https://github.com/plastic-labs/dspy-opentom/blob/main/opentom_evaluator.py) which we leveraged heavily for parsing the LM's answers and assessing them.
|
||||
|
||||
@ -108,7 +108,7 @@ We know that any observed "reasoning" in language models is due to behaviors lea
|
||||
|
||||
There was a time when people were upset at the inability to interpret features learned by neural networks. People have mostly moved on from that limitation in favor of the improved performance, so maybe it's time to do the same here. It follows the design philosophy of DSPy to abstract away the need to manipulate explicit prompts and examples to improve performance on a task. The examples it settled on were learned — DSPy worked exactly how it's supposed to. Deep learning uses neurons in a network to learn latent, arbitrary features optimized against an objective. The abstraction has just moved up a layer to the space of prompts that can be used to optimize against an objective.
|
||||
|
||||
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[User State is State of the Art |infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
|
||||
Thus, the ability to achieve near `gpt-4-turbo` performance (and sometimes exceed it) with a "less powerful" language model that just learns the right examples to seed its generations is incredibly significant. If it can be done in these narrow tasks, it follows that there exists a vast space of other tasks this can be done for. Humans have nearly [[ARCHIVED; User State is State of the Art|infinite "states"]] to make ToM predictions about, so we're going to have to be able to do this repeatedly in order to effectively learn and update our models over time.
|
||||
|
||||
Major thanks go to [Jacob Van Meter](https://www.linkedin.com/in/jacob-van-meter-nc/) for his significant contributions to this project, [Omar Khattab](https://twitter.com/lateinteraction) and the [DSPy](https://dspy-docs.vercel.app/) team, as well as the [OpenToM](https://github.com/seacowx/OpenToM) authors for moving the ToM space forward. You can see all of our code and data [here](https://github.com/plastic-labs/dspy-opentom/tree/main).
|
||||
|
||||
|
||||
@ -7,7 +7,7 @@ tags:
|
||||
- ml
|
||||
---
|
||||
## TL;DR
|
||||
We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test wether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.
|
||||
We developed a benchmark to evaluate how well language models can predict social interactions in conversational settings. We wanted to test wether context can improve these predictions, and whether recent advances in reasoning models translate well from math and coding to social cognition. By testing various models on the task of predicting the next message in real Discord conversations, with and without different types of context, we found that Claude 3.7 Sonnet significantly outperforms other models in its non-reasoning variant, while its reasoning variant performed between 10 and 15 percentage points worse. We discovered that generating context summaries with a smaller model (Llama 3.3 70B) and injecting these into inference yields comparable or better results than providing raw conversation history. On one hand, we're excited that this validates key aspects of the [[ARCHIVED; Theory of Mind Is All You Need|thesis behind our product Honcho]]. On the other hand, we discovered that models highly optimized for technical reasoning often underperform on social cognition tasks.
|
||||
|
||||
Check out the code [here](https://github.com/plastic-labs/next-message-prediction-public).
|
||||
|
||||
@ -25,7 +25,7 @@ This creates a clear, verifiable reward signal for social understanding: either
|
||||
|
||||
This benchmark also allows us to test whether models specifically optimized for technical reasoning generalize to social understanding, and to get a granular, quantifiable understanding of models' social reasoning abilities.
|
||||
## Prior work & inspiration
|
||||
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
|
||||
At Plastic Labs, our journey into AI social cognition began with our experimental tutor, Bloom. We discovered that giving AI systems autonomy to [[ARCHIVED; Theory of Mind Is All You Need|reason about the user's psychology]] led to dramatic improvements in performance. By allowing models to predict users' mental states and identify what additional information they needed, we found that AI systems could develop a nascent theory of mind for each user. This approach, which we later formalized in our [[blog/content/research/Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models|research]] on metacognitive prompting, demonstrated that social context reasoning can significantly reduce prediction errors in large language models.
|
||||
|
||||
With recent work on reasoning models, including DeepSeek's R1, showing remarkable gains through reinforcement learning on mathematical and coding tasks, we're particularly interested in developing verifiable social rewards that could drive similar improvements in social reasoning. Unlike technical domains with clear right and wrong answers, social prediction introduces unique challenges--yet, establishing benchmarks in this area could unlock entirely new dimensions of AI capability that are crucial for creating systems that truly understand and adapt to human users.
|
||||
## Methodology
|
||||
|
||||
@ -104,7 +104,7 @@ The resulting model is Neuromancer XR (for eXplicit Reasoning), a model speciali
|
||||
![[neuromancer_honcho_diagram.png]]
|
||||
*Figure 1. Diagram of the Honcho workflow.*
|
||||
|
||||
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
|
||||
Whenever a message from a [[Beyond the User-Assistant Paradigm; Introducing Peers|peer]] (any user or agent in an interaction) is stored in Honcho, Neuromancer XR reasons about it to derive explicit and deductive conclusions, which are then stored specifically to that peer. This forms a reasoning tree that constitutes our most current representation of each peer. Optionally, the conclusion derivation step can fetch additional context from the peer to enrich its reasoning. Our [[ARCHIVED; Introducing Honcho's Dialectic API|dialectic endpoint]] then allows builders or agents to ask questions about peers in natural language by retrieving and synthesizing reasoning from the representation relevant to the question being asked.
|
||||
# Evaluation
|
||||
Although the Honcho workflow allows us to answer any arbitrary question about a peer, from the purely factual to the predictive, it's important for us to be able to benchmark its raw memory abilities--how accurately it can recall factual information shared by a user in a conversation.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user