--- title: when two macros are faster than one draft: "false" --- While working on my Database datapack (still WIP), I knew I'd want to find the most efficient way to dynamically access dynamically populated arrays. I had some ideas and decided to benchmark them using [Kragast's Benchmark Datapack](https://www.planetminecraft.com/data-pack/benchmark-6443027/). This process was really illuminating to me, and I hope it will be for you as well. Thanks for all the help from **PuckiSilver**, **amandin**, and **Nicoder**. # scenario The following are the dataset and constraints I used to test different methods of accessing data within an array. ## dataset The testing data is stored in the storage `#_macro.array`. The array is populated with a total of 500 entries, each having `id` and `string` fields. ```json [ { id: 1, string: "entry1" }, ... { id: 500, string: "entry500" } ] ``` This dataset could also be represented as a table: | id | string | | --- | ---------- | | 1 | "entry1" | | ... | ... | | 500 | "entry500" | ## constraints The objective is to create an interface that receives a keyword, say `entry500`, and searches `#_macro.array` for an entry where the value of `string` matches the keyword. The keyword must be able to be entered by a player at runtime, and `#_macro.array` can have an arbitrary number of custom entries created by a player. In TypeScript, it would look something like this: ```ts function searchArray(keyword: string) { // logic return theRelevantEntry } searchArray('entry500') ``` In mcfunction, this is not so straightforward. Macros would make this really clean: ```vb function test_namespace:search_array {keyword: "entry500"} ``` Unfortunately, macros come with a performance hit. In this particular situation, we can bypass macros altogether. While it's less elegant, it is more performant to store the keyword in NBT storage prior to calling the function. The storage can be removed after the function is run: ```vb data modify storage test_namespace:test_namespace temp.keyword set value 'entry500' function test_namespace:search_array data remove storage test_namespace:test_namespace temp.keyword ``` Once the entry is found, it is stored in the `temp.result` storage, which can then be consumed by another function. Now for the logic to do the actual array searching. Here, the performance hit of running macros is worth it as the alternative involves a massive number of commands to manually iterate over an array. As we'll see later when benchmarking functions, manual iteration is *really* slow. Macros it is... # one macro Macros allow us to reach into our array and pick out an entry that matching value in the `string` property. This is something that I didn't realize (for some reason) and was pointed out by **PuckiSilver** and **amandin** on the Datapack Hub discord server. ```vb ... one_macro.array[string:$(keyword)] ``` This method is super clean and results in a one liner that is wordy but simple: ```vb '# one_macro/_searcharray.mcfunction $data modify storage test_namespace:test_namespace temp.result set from storage test_namespace:test_namespace one_macro.array[string:$(keyword)] ``` `_searcharray` can then be called using the `temp.keyword` storage: ```vb '# one_macro/run.mcfunction data modify storage test_namespace:test_namespace temp.keyword set value 'entry500' function test_namespace:one_macro/_searcharray with storage test_namespace:test_namespace temp data remove storage test_namespace:test_namespace temp.keyword '# call the function that consumes 'temp.result', then remove it data remove storage test_namespace:test_namespace temp.result ``` # two macro Another way to crack the problem is through indexing. This was my original plan when I didn't realize that `...[{string:$(keyword)}]` was possible. This method requires the creation of an index of the field that is going to be searched. The index is a list of key/value pairs: ```json { entry1: 0, entry2: 1, ... entry500: 499 } ``` The *key*, e.g. `entry2:` corresponds with the value of a `string` field in the main array, while the *value* `1` indicates the main array index where we'll find the full entry. The index can be searched with a direct path, `index.$(keyword)`, and the main array can also be searched with a direct reference to the entry index, `array.#(index)`. Keep in mind that *the index must already exist prior to running the search function*. In a practical application, an index could be updated every time the main array is updated. A scheduled task could also audit the index to ensure that it's up to date. The index search looks like this: ```vb '# two_macro/_searchindex.mcfunction $data modify storage test_namespace:test_namespace temp.index set from storage test_namespace:test_namespace two_macro.index.$(keyword) ``` And the array search looks like this: ```vb '# two_macro/_searcharray.mcfunction $data modify storage test_namespace:test_namespace temp.result set from storage test_namespace:test_namespace two_macro.array[$(index)] ``` The index and array search functions are then called using the `temp.keyword` storage: ```vb '# two_macro/run.mcfunction data modify storage test_namespace:test_namespace temp.keyword set value 'entry500' function test_namespace:two_macro/_searchindex with storage test_namespace:test_namespace temp function test_namespace:two_macro/_searcharray with storage test_namespace:test_namespace temp data remove storage test_namespace:test_namespace temp.keyword data remove storage test_namespace:test_namespace temp.index '# call the function that consumes 'temp.result', then remove it data remove storage test_namespace:test_namespace temp.result ``` # two is faster than one?? I ran benchmarks on a simple iteration-based function and the single-macro function suggested by **PuckiSilver** and **amandin**. I also threw in the two-macro indexing function since I had already coded it. I assumed using one macro would be faster than two, but I was curious exactly *how* much faster it would be. As expected, the iteration-based function was sloooooow. Both macro functions blew it out of the water. Unexpectedly, however, the `two_macro` function doubled the performance of the `one_macro` function. Here are the results (bigger is better): | **function** | **benchmark** | | ------------ | ------------- | | iteration | 416 | | one_macro | 30342 | | two_macro | 72450 | The `two_macro` function is *2.4x* faster than the `one_macro` function. What the heck is going on? How does *adding* an entire second macro function *improve* performance?? It turns out that the clever and convenient `one_macro.array[string:$(keyword)]` triggers iteration to filter the array. Since the iteration is triggered by a macro, it directly runs Java code. It's still much faster than iterating in mcfunction, but the performance hit is O(n). In contrast, the `two_macro` approach directly accesses values by `key` and `index`. These operations have a performance hit of O(1). This was confirmed by **Nicoder**. While I haven't tested it, this means that, when run on a larger dataset, the gap between `two_macro` and `one_macro` should continue to widen. # takeaways Indexing is cool. If you find yourself in a situation where you're working with moderate-to-large arrays and are able to index in advance of querying data, it's absolutely worth it from a query performance standpoint. However, *indexing* is pretty expensive, and also requires active preplanning when writing a datapack. When items are added, updated, or deleted, the index will also need to be updated. A scheduled task should probably be run every so often to audit indexes and identify potential errors. Indexing existing fields that do not already have an index could be annoying. Point being, if it's worth it, *it's worth it*; if it's not, the `one_macro` one liner is simpler and fast enough for most applications.