AutoNumber vs AutoNumberHash128

Summary:  AutoNumberHash128(A, B) runs about 30% faster than AutoNumber(A &’-‘ & B).

It’s a common practice to use the script AutoNumber() function to reduce the storage required for large compound keys in a Qlik data model. For example:

AutoNumber(A & '-' & B) as %KeyField

As a standard practice, we generally include a separator like ‘-‘ to ensure ‘1’ & ’11’ does not get confused with ’11’ & ‘1’.

The AutoNumber process can add significant run time to a script with many rows.

I’ve always wondered what the AutoNumberHash128() function was good for.

AutoNumberHash128(A,B) as %KeyField

This function first hashes A & B and then autonumbers the result. The end result is the same as the first example given using AutoNumber().  I find the AutoNumberHash128 syntax a bit simpler as a separator is not required.

What surprised me is that the AutoNumberHash128() function runs faster.  Typically about 30% faster than a plain AutoNumber with a concatenated string parameter.

Why is it faster?  The difference is in the function used to create the single value to be autonumbered.  Hash128 is considerably faster than string concatenation (&).

AutoNumberHash128() can take any number of fields, but it does not have an “AutoId” parameter.  The “AutoId” (second parameter) in AutoNumber() is recommended to ensure we get sequential integers when autonumbering more than one key field.  Sequential integers are the most memory efficient storage for keys.

Don’t despair.  AutoNumberHash128() will use the “default” AutoId.  That is fine if you are autonumbering only one key.  If you are doing more than one key, use AutoNumberHash128() for your largest — most rows — key and use AutoNumber() with AutoId for the rest.  You will improve the script run time of one key.

Another possible tradeoff when you have many large keys is to use AutoNumberHash128 for all keys and forgo the sequential integer optimization.  You will use only 8 bytes per key value which could be significantly less than the original string keys.

-Rob

 

Share

3 thoughts on “AutoNumber vs AutoNumberHash128”

  1. Nice one, Rob. Good to know hashing is faster. Funnily, I have seen loads of developers using the Hash and Autonumber functions to join the fields and still keeping the original long-concatenated field. I see why they need it but keeping in the QVW file doesn’t make sense. Rather I would have it stored in the QVD file.

  2. Hi Rob, great post.
    I was recently was informed of the existence of the function AutoNumberHash256, which since then I started using to concatenate dimension tables to account for primary keys being changed in the source system tables resulting in missing associations with older rows in my fact table. For this the following approach was used:

    // Transform Source Table

    PreviousTable:
    NoConcatenate
    LOAD *
    FROM [Table.QVD] (qvd);

    STORE PreviousTable into Table_Previous.QVD (QVD);

    NewTable:
    NoConcatenate LOAD
    AutonumberHash256(PK1, PK2, …….) as PrimaryKey,
    *
    FROM [Table_New.QVD] (qvd);

    Concatenate (NewTable)
    LOAD *
    Resident PreviousTable
    Where not Exists(PrimaryKey, AutonumberHash256(PK1, PK2, …….));

    DROP Table PreviousTable;
    DROP Field PrimaryKey From NewTable;

    STORE NewTable into Table_New.QVD (QVD);

    DROP Table NewTable;

    QUESTION: What would be the advantage of AutonumberHash128 be over AutonumberHash256 or vice versa?

  3. Thanks for posting, Rob.
    I Agree, AutoNumber() itself is fast but string concatenation slows it down.
    A possible work-around is to create the concatentated strings beforehand, in Extract stage where incremental load is used. This way we can still use AutoNumber() in the final script, avoiding slow-down due to “string&string” processing.

    When load speed is important, it’s worth to point out that any string concatenation in the load statement will slow down the processing – be it with or without AutoNumber().
    I compared different functions (AutoNumber, Hash, RowNo, RecNo etc) some time ago – results published in the community:
    https://community.qlik.com/thread/283671

Leave a Reply

Your email address will not be published. Required fields are marked *