Today let’s reinforce some QV data model principles for us old timers and introduce a few tricks for newbies to Qlikview.
#1. Keys are not data. Key Fields in a Qlikview model should serve the data modeler, and not be used by chart Designers as data.
Consider two tables, OrderHeader (one row for each Order) and OrderDetails (one row for each Order Line). linked together by Field OrderID.
OrderID may be a value that you need to display in your charts. However, problems arise when you try to do things like count(OrderID). Which end of the connection should the expression count? It’s unreliable as discussed in detail here:
http://qlikviewnotes.blogspot.com/2010/01/best-way-to-count-keys-dont.html
The solution is to create a counter field on the table that represents the correct cardinality for the counter. If we are counting Orders, that would be the OrderHeader table.
In the LOAD of the OrderHeader table:
1 as OrderCounter
Part two of of the recommendation is to isolate the key field so it is not mistakenly used as a data field. We do this by prefixing the key field name with a special character and SETing the QV system variable “HidePrefix” to that character.
SET HidePrefix=’%’;
In the LOAD of both OrderHeader and OrderDetails:
OrderID as %OrderID
Fields that begin with the HidePrefix value will not show up in:
– Current Selections.
– Dimension or Expression property dialog (unless “Show System Fields” is checked).
Of course, the specific values of OrderID may be useful to display in UI charts. In that case we must preserve it as a data item in one and only one table. We will include it in the LOAD of the OrderHeader table. Our data model now looks like this:
OrderID is available as a data field, but appropriately only from the OrderHeader table.
OrderCounter is now available as a field such that
=sum(OrderCounter)
will yield the correct Order count.
Now we (the Data Modelers!) own those “%” key fields! They are ours, we told the UI designers explicitly that “it is not data” .
Part three, and a very important part indeed, is to autonumber() the key fields. Autonumber() is a Qlikview lookup function that translates parameter values into integers. The sequential integers returned by autonumber() will reduce the RAM requirements and increase the linkage efficiency as detailed here
http://qlikviewnotes.blogspot.com/2008/05/memory-sizes-for-data-types.html
and
http://community.qlikview.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers
Admittedly, it gets a bit geeky. Bottom line, here is what you want to do for your keys:
autonumber(OrderID, ‘%OrderID’) as %OrderID
The autonumber() function converts values to sequential integers. The second parameter, ‘%OrderID’, is important if you have multiple keys being autonumber() in your script.
To summarize:
#1. Keys are not data.
#2, Isolate keys using the “SET HidePrefix=x;” variable. Establish counter fields on the appropriate table.
#3. Use the Autonumber() function to convert key field values to sequential integers. This will minimize the memory footprint of the application and improve the efficiency of cross table look-ups.
My friend and colleague Barry Harmsen, author of QlikView 11 for Developers, who is a much wiser data modeler than I, will be discussing QV data modeling in depth with me at the Masters Summit for Qlikview in Europe this October. I hope you can join us!
Hi, I love Your blog, especially helpful for me was the Now() trap 🙂
Regarding this article about autonumber() i was wondering what is the difference regarding autonumber() and autonumberhash128/256, which is better for response time and memory usage ?.
Also even if i do prepare key like (with hide suffix “_”)
AutonumbertHash256(Customer ID, Organization ID, Division ID) as ORG_KEY_
in each other table which uses this key for binding i do it in the same way, but as i understand it would be better to use autonumber with parameter ‘ORG_KEY_” ?
Thanks for all your work on the Blog 🙂
Best regards,
Łukasz
I’m glad you’re enjoying the blog and that I helped you escape the trap 🙂
autonumberhash() first hashes the field values and then creates an autonumber from the hash. The syntax of autonumberhash() is slightly easier when using multiple input fields because you don’t have to use the & and a separator character. Both functions should give the same result although autonumber() is technically more correct. There is a *very small* chance of a hash-collision (incorrect duplicate values) using the hash form.
I’m actually not sure why the autonumberhash() forms were created.
I have not noticed much difference in performance between the two, but I have not done any formal study.
The second parameter ‘ORG_KEY_’ is only necessary if you are sutonumbering two or more fields (not the same fieldname on different tables, but two distinct fields). I usually include the second parameter even when doing one field just in case I add another field later.
Ok as autonumber was first introduced (i was working with QV from version 8.5) i always thought that autonumberhash is some kind of upgrade, more reliable and easier to use.
Also i think (memory is not so good ay more 😉 ) that i faced some issues when i had format issues for example for some reason in one table the id field was recognized as a mixed string and in other table the same field (but from different DB table) could be only numeric. then i think the problem could be that the autonumber key for that field could be different even if the value is in fact the same.
Also another question please 🙂
when you use Autonumber() with lots of field with & parameter are the field values not turned into string or validated in some way ? What i am trying to say, is that maybe autonumber takes a bit more time to generate than autonumberhash. but i also did not study this subject, just got used to autonumberhash and was intrigued by the current post with autonumber, where i would rather see autonumberhash as a newer version of the same functionality 😉 which thanks to You i now know that still have some ace in the sleeve 🙂
Just so I understand as I am using a link table. For Example.
The AutoNumber() would be if joining as in your example on 1 field to another between two tables.
But in a Link Table scenario I would create something like this.
AutoNumberHash(UserID, DeptID, Date) as %KeyField
Or do you do some sort of AutoNumber() on each field and then combine like this.
Load *,
‘%UserID’ & ‘-‘ & ‘%DeptID’ & ‘-‘ & ‘%Date’ as %KeyField;
Load
AutoNumber(UserID, ‘%UserID’) as %UserID,
AutoNumber(DeptID, ‘%DeptID’) as %DeptID,
AutoNumber(Date, ‘%Date’) as %Date
Thank you very much Rob.
Always getting great tips from you.
Darrin
Hi Darrin,
You should autonumber the fields together like
AutoNumberHash(UserID, DeptID, Date) as %KeyField
-Rob
Hi,
I have a field that contains hierarchie like that :
“France/Sales/Shoes”.
If I understand well the unique value tips it would consume less RAM to split this field into 3 fields using subfield function. But if we want to show the full hierarchie in one columnof the charts, we will use field1&’/’&field2&’/’field3, will it consume more RAM when using it in this case ?
It will use less RAM if you split the fields. But it will use more CPU at runtime when you need to combine them in a chart. It’s a tradeoff. If you don’t use the combination in many charts, splitting them may pay off.
Hi Rob,
Just wanted to say fantastic blog. Thanks for sharing so generously your knowledge and keeping it updated and relevant.
Cheers,
Byron
Hello Rob,
Thank you for this. But my data is over a billion records and I am incrementally loading data. For the key field (composite key) I am using AutoNumberHash256(). I do see some cases of collision.
I am inclined to use AutoNumber instead. But then how do I ensure uniqueness when I perform incremental load ? Any thoughts ?
Interesting that you are getting collisions. I think your only alternative in this case is to maintain the key as concatenated fields in the QVD. Then perhaps autonumber in the final dashboard to save RAM.
Hello Rob, Thanks for the quick response. But am afraid that wont may / maynot work.
Am dealing with a customer whose data is really massive. My main link table which has 6 keys is around 100 billion records. Now if I have to autonumber the key fields everytime I load the model then I would be autonumbering in the model load time – do you think that would work ?
So the intial QVD generators have the raw field but the model has the Autonumber ?
Also, in the same application if I have a loop to load every month, the autonumber sequence will still be maintained I suppose. Can you please confirm.
Thanks,
Autonumber is unique and consistent within a load, so the sequence will be maintained in the loop, as long as it all happens within the same script run.
A link table of 1000B rows? Does the application perform adequately?
When dealing with something this large, “standard QV” recommendations may not apply. For example, you want to consider moving all ETL activity into a data warehouse.
To my surprise it is okay so far *fingers crossed*
When you refer to moving all ETL activity, do you mean all transformations we do at QVD generator level or even the QVD for the datamodel ?
Yes, I move all Extract and transformation into a DW so the final QV only needs to do SELECT *.
Hi Rob,
I am looking for source code or algorithm of hash256() function in order to validate a value sent from QV to Java App via WebPageViewer2. Do you know if it is possible to get it and from where?
The idea is to sent some user’s selection in link. This link should contain also hashed value of the selection. Java App then validates this link and if hash value is the same on Java side it lets user in. If the hash is not found then user doesn’t get access.
Thanks in advance!
When I google “hash256 code” I fond lots of examples.
Hi,
Have you noticed / is it obvious that after “Reduce Data -> Keep possible values” or reduction with section access, the autonumber optimization is lost?
Thanks
I never thought about this, but it makes sense. If the reduction causes the list to no longer be sequential, the list would become a standard list of values.
Can u show me the examples
Rob, what about Autonumber() and partial load? As far as I understand, it is not working, so it is better to autonumber such fields (which tables participate in partial load) somewhere before it comes to partial load in the final app, isn’t it?