From 3168fa8b9270d0db9b3e4179065e8026fdca279a Mon Sep 17 00:00:00 2001 From: themodernhakr Date: Tue, 25 Mar 2025 00:31:54 -0500 Subject: [PATCH] vault backup: 2025-03-25 00:31:54 --- .../When Two Macros are Faster than One.md | 24 +++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/Minecraft Datapacking/When Two Macros are Faster than One.md b/Minecraft Datapacking/When Two Macros are Faster than One.md index e862f73..53852cb 100644 --- a/Minecraft Datapacking/When Two Macros are Faster than One.md +++ b/Minecraft Datapacking/When Two Macros are Faster than One.md @@ -2,11 +2,11 @@ title: when two macros are faster than one draft: "false" --- -While working on my Database datapack (still WIP), I knew I'd want to find +While working on my Database datapack (still WIP), I knew I'd want to find the most efficient way to dynamically access dynamically populated arrays. I had some ideas and decided to benchmark them using [Kragast's Benchmark Datapack](https://www.planetminecraft.com/data-pack/benchmark-6443027/). This process was really illuminating to me, and I hope it will be for you as well. Thanks for all the help from **PukiSilver**, **amandin**, and **Nicoder**. # scenario ## dataset -The data is stored in a storage `#_macro.array`. Array is populated with a total of 500 entries, each having `id` and `string` fields. +The testing data is stored in the storage `#_macro.array`. The array is populated with a total of 500 entries, each having `id` and `string` fields. ```json [ { @@ -114,4 +114,24 @@ data remove storage test_namespace:test_namespace temp.index '# call the function that consumes 'temp.result', then remove it data remove storage test_namespace:test_namespace temp.result ``` +# two is faster than one?? +I ran benchmarks on a simple iteration-based function and the single-macro function suggested by **PukiSilver** and **amandin**. I also threw in the two-macro indexing function since I had already coded it. I assumed using one macro would be faster than two, but I was curious exactly *how* much faster it would be. +As expected, the iteration-based function was sloooooow. Both macro functions blew it out of the water. Unexpectedly, however, the `two_macro` function doubled the performance of the `one_macro` function. Here are the results (bigger is better): + +| **function** | **benchmark** | +| ------------ | ------------- | +| iteration | 416 | +| one_macro | 30342 | +| two_macro | 72450 | +The `two_macro` function is *2.4x* faster than the `one_macro` function. + +What the heck is going on? How does *adding* an entire second macro function *improve* performance?? + +It turns out that the clever and convenient `one_macro.array[string:$(keyword)]` triggers iteration to filter the array. Since the iteration is triggered by a macro, it directly runs Java code. It's still much faster than iterating in mcfunction, but the performance hit is O(n). In contrast, the `two_macro` approach directly accesses values by `key` and `index`. These operations have a performance hit of O(1). While I haven't tested it, this means that, when run on a larger dataset, the gap between `two_macro` and `one_macro` should continue to widen. +# takeaways +Indexing is cool. If you find yourself in a situation where you're working with moderate-to-large arrays and are able to index in advance of querying data, it's absolutely worth it from a query performance standpoint. + +However, *indexing* is pretty expensive, and also requires active preplanning when writing a datapack. When items are added, updated, or deleted, the index will also need to be updated. A scheduled task should probably be run every so often to audit indexes and identify potential errors. Indexing existing fields that do not already have an index could be annoying. + +Point being, if it's worth it, *it's worth it*; if it's not, the `one_macro` one liner is simpler and fast enough for most applications. \ No newline at end of file