diff --git a/content/research/Benchmarking Honcho.md b/content/research/Benchmarking Honcho.md index 7a4652fa2..89e62ce5b 100644 --- a/content/research/Benchmarking Honcho.md +++ b/content/research/Benchmarking Honcho.md @@ -14,9 +14,9 @@ description: Honcho achieves state-of-the-art performance & pareto dominance acr # TL;DR *Honcho achieves state-of-the-art performance across the LongMem, LoCoMo, and BEAM memory benchmarks--**90.4%** on LongMem S (**92.6%** with Gemini 3 Pro), **89.9%** on LoCoMo ([beating our previous score of **86.9%**](https://blog.plasticlabs.ai/research/Introducing-Neuromancer-XR)), and top scores across all BEAM tests. We do so while maintaining SOTA token efficiency.* -*But testing recall on benchmark data that fits in frontier context windows is no longer particularly meaningful. Beyond simple recall, Honcho reasons over memory and empowers frontier models to reason across more tokens than their context windows support. +*But testing recall on benchmark data that fits in frontier context windows is no longer particularly meaningful. Beyond simple recall, Honcho reasons over memory and empowers frontier models to reason across more tokens than their context windows support.* -Check out [evals.honcho.dev](https://evals.honcho.dev) for charts and comparisons.* +*Check out [evals.honcho.dev](https://evals.honcho.dev) for charts and comparisons.*