vault backup: 2025-03-25 00:31:54
All checks were successful
Update pages on webserver / Update (push) Successful in 6s

This commit is contained in:
themodernhakr 2025-03-25 00:31:54 -05:00
parent 7d6c3784a3
commit 3168fa8b92

View File

@ -2,11 +2,11 @@
title: when two macros are faster than one
draft: "false"
---
While working on my Database datapack (still WIP), I knew I'd want to find
While working on my Database datapack (still WIP), I knew I'd want to find the most efficient way to dynamically access dynamically populated arrays. I had some ideas and decided to benchmark them using [Kragast's Benchmark Datapack](https://www.planetminecraft.com/data-pack/benchmark-6443027/). This process was really illuminating to me, and I hope it will be for you as well. Thanks for all the help from **PukiSilver**, **amandin**, and **Nicoder**.
# scenario
## dataset
The data is stored in a storage `#_macro.array`. Array is populated with a total of 500 entries, each having `id` and `string` fields.
The testing data is stored in the storage `#_macro.array`. The array is populated with a total of 500 entries, each having `id` and `string` fields.
```json
[
{
@ -114,4 +114,24 @@ data remove storage test_namespace:test_namespace temp.index
'# call the function that consumes 'temp.result', then remove it
data remove storage test_namespace:test_namespace temp.result
```
# two is faster than one??
I ran benchmarks on a simple iteration-based function and the single-macro function suggested by **PukiSilver** and **amandin**. I also threw in the two-macro indexing function since I had already coded it. I assumed using one macro would be faster than two, but I was curious exactly *how* much faster it would be.
As expected, the iteration-based function was sloooooow. Both macro functions blew it out of the water. Unexpectedly, however, the `two_macro` function doubled the performance of the `one_macro` function. Here are the results (bigger is better):
| **function** | **benchmark** |
| ------------ | ------------- |
| iteration | 416 |
| one_macro | 30342 |
| two_macro | 72450 |
The `two_macro` function is *2.4x* faster than the `one_macro` function.
What the heck is going on? How does *adding* an entire second macro function *improve* performance??
It turns out that the clever and convenient `one_macro.array[string:$(keyword)]` triggers iteration to filter the array. Since the iteration is triggered by a macro, it directly runs Java code. It's still much faster than iterating in mcfunction, but the performance hit is O(n). In contrast, the `two_macro` approach directly accesses values by `key` and `index`. These operations have a performance hit of O(1). While I haven't tested it, this means that, when run on a larger dataset, the gap between `two_macro` and `one_macro` should continue to widen.
# takeaways
Indexing is cool. If you find yourself in a situation where you're working with moderate-to-large arrays and are able to index in advance of querying data, it's absolutely worth it from a query performance standpoint.
However, *indexing* is pretty expensive, and also requires active preplanning when writing a datapack. When items are added, updated, or deleted, the index will also need to be updated. A scheduled task should probably be run every so often to audit indexes and identify potential errors. Indexing existing fields that do not already have an index could be annoying.
Point being, if it's worth it, *it's worth it*; if it's not, the `one_macro` one liner is simpler and fast enough for most applications.