All checks were successful
Update pages on webserver / Update (push) Successful in 6s
137 lines
7.2 KiB
Markdown
137 lines
7.2 KiB
Markdown
---
|
|
title: when two macros are faster than one
|
|
draft: "false"
|
|
---
|
|
While working on my Database datapack (still WIP), I knew I'd want to find the most efficient way to dynamically access dynamically populated arrays. I had some ideas and decided to benchmark them using [Kragast's Benchmark Datapack](https://www.planetminecraft.com/data-pack/benchmark-6443027/). This process was really illuminating to me, and I hope it will be for you as well. Thanks for all the help from **PukiSilver**, **amandin**, and **Nicoder**.
|
|
|
|
# scenario
|
|
## dataset
|
|
The testing data is stored in the storage `#_macro.array`. The array is populated with a total of 500 entries, each having `id` and `string` fields.
|
|
```json
|
|
[
|
|
{
|
|
id: 1,
|
|
string: "entry1"
|
|
},
|
|
...
|
|
{
|
|
id: 500,
|
|
string: "entry500"
|
|
}
|
|
]
|
|
```
|
|
## constraints
|
|
The objective is to create an interface that receives a keyword, say `entry500`, and searches `#_macro.array` for an entry where the value of `string` matches the keyword.
|
|
|
|
The keyword must be able to be entered by a player at runtime, and `#_macro.array` can have an arbitrary number of custom entries created by a player.
|
|
|
|
In TypeScript, it would look something like this:
|
|
```ts
|
|
function searchArray(keyword: string) {
|
|
// logic
|
|
return theRelevantEntry
|
|
}
|
|
|
|
searchArray('entry500')
|
|
```
|
|
In mcfunction, this is not so straightforward. Macros would make this really clean:
|
|
```vb
|
|
function test_namespace:search_array {keyword: "entry500"}
|
|
```
|
|
Unfortunately, macros come with a performance hit. A more performant method, albeit less elegant, is to store the keyword in NBT storage prior to calling the function. The storage can be removed after the function is run:
|
|
```vb
|
|
data modify storage test_namespace:test_namespace temp.keyword set value 'entry500'
|
|
function test_namespace:search_array
|
|
data remove storage test_namespace:test_namespace temp.keyword
|
|
|
|
```
|
|
Once the entry is found, it is stored in the `temp.result` storage, which can then be consumed by another function.
|
|
|
|
Now for the logic to do the actual array searching...
|
|
# one macro
|
|
Macros allow us to reach into our array and pick out an entry that matching value in the `string` property. This is something that I didn't realize (for some reason) and was pointed out by **PuckiSilver** and **amandin** on the Datapack Hub discord server.
|
|
```vb
|
|
... one_macro.array[string:$(keyword)]
|
|
```
|
|
This method is super clean and results in a one liner that is wordy but simple:
|
|
```vb
|
|
'# one_macro/_searcharray.mcfunction
|
|
|
|
$data modify storage test_namespace:test_namespace temp.result set from storage test_namespace:test_namespace one_macro.array[string:$(keyword)]
|
|
```
|
|
`_searcharray` can then be called using the `temp.keyword` storage:
|
|
```vb
|
|
'# one_macro/run.mcfunction
|
|
|
|
data modify storage test_namespace:test_namespace temp.keyword set value 'entry500'
|
|
|
|
function test_namespace:one_macro/_searcharray with storage test_namespace:test_namespace temp
|
|
|
|
data remove storage test_namespace:test_namespace temp.keyword
|
|
|
|
'# call the function that consumes 'temp.result', then remove it
|
|
data remove storage test_namespace:test_namespace temp.result
|
|
```
|
|
# two macro
|
|
Another way to crack the problem is through indexing. This was my original plan when I didn't realize that `...[string:$(keyword)]` was possible.
|
|
|
|
This method requires the creation of an index of the field that is going to be searched. The index will look something like this:
|
|
```json
|
|
[
|
|
{entry1: 0},
|
|
{entry2: 1},
|
|
...
|
|
{entry500: 499}
|
|
]
|
|
```
|
|
The *key*, e.g. `entry2:` corresponds with the value of a `string` field in the main array, while the value `1` indicates the main array index where we'll find the full entry. The index can be searched with a direct path, `index.$(keyword)`, and the main array can also be searched with a direct reference to the entry index, `array.#(index)`. Keep in mind that *the index must already exist prior to running the search function*. In a practical application, an index could be updated every time the main array is updated. A scheduled task could also audit the index to ensure that it's up to date.
|
|
|
|
The index search looks like this:
|
|
```vb
|
|
'# two_macro/_searchindex.mcfunction
|
|
|
|
$data modify storage test_namespace:test_namespace temp.index set from storage test_namespace:test_namespace two_macro.index.$(keyword)
|
|
```
|
|
And the array search looks like this:
|
|
```vb
|
|
'# two_macro/_searcharray.mcfunction
|
|
|
|
$data modify storage test_namespace:test_namespace temp.result set from storage test_namespace:test_namespace two_macro.array[$(index)]
|
|
```
|
|
The index and array search functions are then called using the `temp.keyword` storage:
|
|
```vb
|
|
'# two_macro/run.mcfunction
|
|
|
|
data modify storage test_namespace:test_namespace temp.keyword set value 'entry500'
|
|
|
|
function test_namespace:two_macro/_searchindex with storage test_namespace:test_namespace temp
|
|
|
|
function test_namespace:two_macro/_searcharray with storage test_namespace:test_namespace temp
|
|
|
|
data remove storage test_namespace:test_namespace temp.keyword
|
|
data remove storage test_namespace:test_namespace temp.index
|
|
|
|
'# call the function that consumes 'temp.result', then remove it
|
|
data remove storage test_namespace:test_namespace temp.result
|
|
```
|
|
# two is faster than one??
|
|
I ran benchmarks on a simple iteration-based function and the single-macro function suggested by **PukiSilver** and **amandin**. I also threw in the two-macro indexing function since I had already coded it. I assumed using one macro would be faster than two, but I was curious exactly *how* much faster it would be.
|
|
|
|
As expected, the iteration-based function was sloooooow. Both macro functions blew it out of the water. Unexpectedly, however, the `two_macro` function doubled the performance of the `one_macro` function. Here are the results (bigger is better):
|
|
|
|
| **function** | **benchmark** |
|
|
| ------------ | ------------- |
|
|
| iteration | 416 |
|
|
| one_macro | 30342 |
|
|
| two_macro | 72450 |
|
|
The `two_macro` function is *2.4x* faster than the `one_macro` function.
|
|
|
|
What the heck is going on? How does *adding* an entire second macro function *improve* performance??
|
|
|
|
It turns out that the clever and convenient `one_macro.array[string:$(keyword)]` triggers iteration to filter the array. Since the iteration is triggered by a macro, it directly runs Java code. It's still much faster than iterating in mcfunction, but the performance hit is O(n). In contrast, the `two_macro` approach directly accesses values by `key` and `index`. These operations have a performance hit of O(1). While I haven't tested it, this means that, when run on a larger dataset, the gap between `two_macro` and `one_macro` should continue to widen.
|
|
# takeaways
|
|
Indexing is cool. If you find yourself in a situation where you're working with moderate-to-large arrays and are able to index in advance of querying data, it's absolutely worth it from a query performance standpoint.
|
|
|
|
However, *indexing* is pretty expensive, and also requires active preplanning when writing a datapack. When items are added, updated, or deleted, the index will also need to be updated. A scheduled task should probably be run every so often to audit indexes and identify potential errors. Indexing existing fields that do not already have an index could be annoying.
|
|
|
|
Point being, if it's worth it, *it's worth it*; if it's not, the `one_macro` one liner is simpler and fast enough for most applications. |