Summary: AutoNumberHash128(A, B) runs about 30% faster than AutoNumber(A &’-‘ & B).
It’s a common practice to use the script AutoNumber() function to reduce the storage required for large compound keys in a Qlik data model. For example:
AutoNumber(A & '-' & B) as %KeyField
As a standard practice, we generally include a separator like ‘-‘ to ensure ‘1’ & ’11’ does not get confused with ’11’ & ‘1’.
The AutoNumber process can add significant run time to a script with many rows.
I’ve always wondered what the AutoNumberHash128() function was good for.
AutoNumberHash128(A,B) as %KeyField
This function first hashes A & B and then autonumbers the result. The end result is the same as the first example given using AutoNumber(). I find the AutoNumberHash128 syntax a bit simpler as a separator is not required.
What surprised me is that the AutoNumberHash128() function runs faster. Typically about 30% faster than a plain AutoNumber with a concatenated string parameter.
Why is it faster? The difference is in the function used to create the single value to be autonumbered. Hash128 is considerably faster than string concatenation (&).
AutoNumberHash128() can take any number of fields, but it does not have an “AutoId” parameter. The “AutoId” (second parameter) in AutoNumber() is recommended to ensure we get sequential integers when autonumbering more than one key field. Sequential integers are the most memory efficient storage for keys.
Don’t despair. AutoNumberHash128() will use the “default” AutoId. That is fine if you are autonumbering only one key. If you are doing more than one key, use AutoNumberHash128() for your largest — most rows — key and use AutoNumber() with AutoId for the rest. You will improve the script run time of one key.
Another possible tradeoff when you have many large keys is to use AutoNumberHash128 for all keys and forgo the sequential integer optimization. You will use only 8 bytes per key value which could be significantly less than the original string keys.