vault backup: 2025-03-25 00:31:54

2025-03-25 00:31:54 -05:00 · 2025-03-25 00:31:54 -05:00 · 3168fa8b92
commit 3168fa8b92
parent 7d6c3784a3
1 changed files with 22 additions and 2 deletions
--- a/Datapacking/When
+++ b/Datapacking/When
@ -2,11 +2,11 @@
 title: when two macros are faster than one
 draft: "false"
 ---
-While working on my Database datapack (still WIP), I knew I'd want to find 
+While working on my Database datapack (still WIP), I knew I'd want to find the most efficient way to dynamically access dynamically populated arrays. I had some ideas and decided to benchmark them using [Kragast's Benchmark Datapack](https://www.planetminecraft.com/data-pack/benchmark-6443027/). This process was really illuminating to me, and I hope it will be for you as well. Thanks for all the help from **PukiSilver**, **amandin**, and **Nicoder**.

 # scenario
 ## dataset
-The data is stored in a storage `#_macro.array`. Array is populated with a total of 500 entries, each having `id` and `string` fields.
+The testing data is stored in the storage `#_macro.array`. The array is populated with a total of 500 entries, each having `id` and `string` fields.
 ```json
 [
 	{
@ -114,4 +114,24 @@ data remove storage test_namespace:test_namespace temp.index
 '# call the function that consumes 'temp.result', then remove it
 data remove storage test_namespace:test_namespace temp.result
 ```
+# two is faster than one??
+I ran benchmarks on a simple iteration-based function and the single-macro function suggested by **PukiSilver** and **amandin**. I also threw in the two-macro indexing function since I had already coded it. I assumed using one macro would be faster than two, but I was curious exactly *how* much faster it would be.

+As expected, the iteration-based function was sloooooow. Both macro functions blew it out of the water. Unexpectedly, however, the `two_macro` function doubled the performance of the `one_macro` function. Here are the results (bigger is better):
+
+| **function** | **benchmark** |
+| ------------ | ------------- |
+| iteration    | 416           |
+| one_macro    | 30342         |
+| two_macro    | 72450         |
+The `two_macro` function is *2.4x* faster than the `one_macro` function.
+
+What the heck is going on? How does *adding* an entire second macro function *improve* performance??
+
+It turns out that the clever and convenient `one_macro.array[string:$(keyword)]` triggers iteration to filter the array. Since the iteration is triggered by a macro, it directly runs Java code. It's still much faster than iterating in mcfunction, but the performance hit is O(n). In contrast, the `two_macro` approach directly accesses values by `key` and `index`. These operations have a performance hit of O(1). While I haven't tested it, this means that, when run on a larger dataset, the gap between `two_macro` and `one_macro` should continue to widen.
+# takeaways
+Indexing is cool. If you find yourself in a situation where you're working with moderate-to-large  arrays and are able to index in advance of querying data, it's absolutely worth it from a query performance standpoint.
+
+However, *indexing* is pretty expensive, and also requires active preplanning when writing a datapack. When items are added, updated, or deleted, the index will also need to be updated. A scheduled task should probably be run every so often to audit indexes and identify potential errors. Indexing existing fields that do not already have an index could be annoying.
+
+Point being, if it's worth it, *it's worth it*; if it's not, the `one_macro` one liner is simpler and fast enough for most applications.