“Fastest” Method to Read max(field) From a QVD

Gather any group of enthusiasts together over a pint after work and inevitably there will be debate over what I call a “Pub Question”. Comic Fans will argue who would win in a fight between Batman and Spiderman, Web Programmers will debate the merits of Java vs .Net. For a group of Qlikview Script developers, the question may be “what’s the fastest script to get max value of a field from a QVD?”.

Qlikview has virtual Pubs in the Community Forums and blogs where these questions get raised. In a recent blog post http://qlikviewnotes.blogspot.com/2013/08/super-fast-method-to-retrieve-qvd-high.html I got some comments about this very question, including a couple of methods I had not tried before.

Let’s dispense with the Resident Table first. The fastest method for a Resident Table is loading the FieldValue()s.

LOAD Max(Id) as MaxId;
LOAD FieldValue(‘Id’, recno()) as Id
AUTOGENERATE FieldValueCount(‘Id’); 

This is much faster than loading From Resident like this:

LOADmax(Id) Resident data; 

Here’s a comparison of the two:

For the test, I use a table with 10M rows and 7 fields. I vary the cardinality of the “Id” field from 1:1 to 100K:1. The x-axis of the chart shows the factor and the number of unique values (rows / factor).

The left side of the chart represents the maximum unique values at 1:1. The rightmost side of  the chart is least unique with only 100 values.  These ranges represent some real world cases such as:
1:1 — Primary Keys
2:1 — Timestamp values
1000 values — Date values

For a 1:1 case, there is not much difference between the FieldValues and Resident methods. However, as the number of values decrease, the FieldValue method follows in a linear trend. The Resident stays fairly constant as the same number of rows are being read.

Now on to the answering the “Fastest” when loading from a QVD. The answer, like all good Pub Questions, is “it depends”.  I ran timing tests on the following methods:

QVD :

LOADmax(Id) FROM (qvd); 

QVD-*/FieldValues:
LOAD * FROM (qvd);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
QVD-DISTINCT-Field/FieldValues:
LOAD DISTINCT Id FROM (qvd);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
QVD-DISTINCT-Field/Resident:
LOAD DISTINCT Id FROM (qvd);
LOADmax(Id) Resident table;
QVD-Field/FieldValues:
LOAD Id FROM (qvd);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
QVD-Field/Resident:
LOAD Id FROM (qvd);
LOAD max(IdResident table;

Everything but the first case will be an Optimized QVD Load.

Here are the test results.

Surprisingly, the QVD only case is the winner for Factors 1-3, even though it is an un-optimized Load. This is not so surprising when you consider how data is stored in a QVD. The field values are stored in a symbol array, just like a qvw in RAM. Reading a single field would require only reading in those symbols. Combining the max() logic with this step would be efficient.

Note that the LOAD RESIDENT operation doesn’t benefit much from a deceasing number of values. At the 1:1 end, all techniques are processing the same quantity of 10M values. As we move to the right, LOAD RESIDENT continues to process 10M values (rows) while the other techniques benefit from lessor quantities.

Let’s drill in a bit. I’ll exclude the QVD-Field/Resident test (for scale) and narrow the Factors to the middle stuff, 4 – 1000.

And here’s the total time for those selections.

So…if I had to pick the “fastest”:
– For 1:1 cardinality, like Primary Keys or revision numbers, direct from the QVD is fastest.
– For most other general cases, Loading just the field and then processing Fieldvalues is fastest.

Of course, there are a number of factors beside the data profile that can impact the results. Disk speed, number of cores, etc. I’d love to hear if you have different experiences.

My colleagues at the upcoming October 9 Masters Summit for Qlikview in London and Barcelona may have some thoughts of their own on this question. In addition to core topics, Oleg is presenting “Performance Tuning” and Bill is presenting “Tips & Tricks”. Both sessions should provide some practical pointers for speeding up your apps. There is still time to register for the Summit. If you attend, perhaps we can continue this discussion at an evening “Pub Session”.

-Rob

Share

25 thoughts on ““Fastest” Method to Read max(field) From a QVD”

  1. Hi,

    I’ve tried the example with QVD-Field/FieldValues and I can’t get it to work. It only seems to generate a ?? value. Can you post and example file?

    Thanks

  2. Hi Rob,
    thanks for this interesting test application. If the data are ordered like in this case (Id=RecNo) I guess the fastest option should be QVD-Field/Peek.. 😉
    Best regards,
    Ralf

  3. Btw. there is a bug in the QVW download, the field name
    “Test” should be “TestSource”,
    “Factor” should be “TestFactor”
    to get the UI working right:

    SUB End
    TestLog:
    LOAD
    ‘$(vTestFactor)’ as Factor,
    ‘$(vTestSource)’ as Test,
    $(vTestStart) as TestStart,
    GetTime() as TestEnd
    AutoGenerate 1
    ;
    END SUB

  4. Hi Ralf,
    A QVD, due to delta loads, is not likely to be ordered by Id. However, that the delta rows would be on top and sorted DESC in which case the fastest option may be:
    FIRST 1 LOAD Id…;

    P.S. I thought a reload would have taken care of that UI fieldname change, but I’ll upload another copy along with some more tests.

  5. Hi Rob, I observed some times FieldValue & IterNo gave good performance over Resident:

    LOAD MAX(TEMP) AS MaxID;
    LOAD FieldValue(‘ID’,IterNo()) AS TEMP
    Autogenerate 1 while NOT IsNull(FieldValue(‘ID’,IterNo());

  6. Hi Rob,
    I simply can’t see where I’m going wrong with your method for finding the largest record of a field in a resident table. I’ve given up trying in a large real life example and been reduced to trying a simple INLINE table but still my script is wrong.
    In this example I expect to obtain a table tblMaxNum with a single field MaxNum of value 45 but instead I get a value of ‘?’.

    If you can spare a minute or two to relieve my torment I will be forever grateful.

    Num:
    LOAD * INLINE [
    Num
    1
    45
    3
    6
    8
    3
    ];

    tblMaxNum:
    LOAD max(FieldValue(‘Num’,RecNo())) as MaxNum
    AutoGenerate FieldValueCount(‘Num’);

    I can get the suggestion of the previous poster (Sree M, thanks) to work but not yours
    tblMaxNum:
    LOAD MAX(TEMP) AS MaxNum;
    LOAD FieldValue(‘Num’,IterNo()) AS TEMP
    Autogenerate 1 while NOT IsNull(FieldValue(‘Num’,IterNo()));

    Cheers

    Andrew Walker

    1. since an old post… for future readers … I think the ‘?’ comes as a result that your data is NOT distinct. You have the value ‘3’ repeat a second time in your data. My understanding is that FieldValue() works on recordsets that are DISTINCT. I do know that FieldvalueCount() does return the number of DISTINCT records in the field. If the field does not have distinct values then you can get a ‘?’ as a result of using FieldValue() . Rob – is this accurate or am i missing something?

  7. Hi Andrew,
    You couldn’t get my example to work because the example was wrong. The preceding load got lost in a migration between hosts. Sorry for that. It’s been corrected and here’s how your example should look.

    tblMaxNum:
    LOAD max(Num) as MaxNum;
    LOAD FieldValue(‘Num’,RecNo()) as Num
    AutoGenerate FieldValueCount(‘Num’);

    1. Hi Rob,

      I was trying to implement Andrew’s example and i get message:
      “General error
      Invalid Autogenerate count: -1e+300
      LOAD FieldValue(‘Num’,RecNo()) as Num
      AutoGenerate FieldValueCount(‘Num’)”

      Where did I go wrong?

      Cheers
      Cezary Pospiszyl

  8. Hi Rob –
    I’ve tried the suggested solution I must say it’s critically fine. Compared with the “Resident” method it reduced the elapsed time by 99% or more. Objective was to find the max date on a base table of approximately 3.5 MIO rows.
    Really fine, thank you!

  9. Hi Rob

    Implemented the following:

    LOAD Id FROM data.qvd(qvd) WHERE NOT EXISTS(Id);
    LOAD max(Id) Resident Max;

    And had a 32.5% improvement compared with:

    LOAD Id FROM (qvd);
    LOAD max(FieldValue(‘Id’, recno()))
    AUTOGENERATE FieldValueCount(‘Id’);

    Thanks 🙂

  10. Hi Rob –

    Is it possible to implement this method when we have to find max by another field? For instance, if I have to get max(Sales) by region. Is it doable using above method?

    Best,
    Sunny

    1. Hi Sunny,

      The fieldvalues() method would not work when grouping by another field. FieldValues() works on the symbol tables. The linkage between a “Sales” values and a “Region” values require that you go through the data by row,

  11. Hi

    I know this is an old topic. But in relation to Sunnys question – have you done some simarlar testing on best performance on max values on dimensions.

    I have a data set on approximately 8 billions rows of changing data – and need to find the min and max values for 4 millions different dimensions. So if someone already have done some simalar testing im very interested to learn about the findings ?

    /Martin

  12. I get a ‘?’ returned when using Fieldvalue(‘MyDate’). My current understanding is this occurs because the values in ‘MyDate’ are not distinct – I have several records for the same date. Likewise the ‘FieldvalueCount(‘MyDate’)’ returns the count of disitinct dates in ‘MyDate’. So I get an 8 when actually have 25 records.

    What is the most expedited process for getting the max (and min) for an existing field that does have duplicates?

    Thanks.

    1. The “?” format is a side effect of using FieldValue(). It sometimes gets confused about the display format, although the internal value is correct. You can fix it by adding a formatting function like Date() to the expresisson like:

      Date(max(FieldValue(‘Id’, recno())))

  13. Hi Rob,

    Thanks for taking the time to make such great Qlik content! I have seen your article linked several times in the Qlik community in reference to Qlik Sense performance tuning. When I read the article, I was hesitant to assume the results would port directly over to Qlik Sense. So I created my own QS app based on your QV app and I didn’t see similar performances results, specifically the FieldValues category was consistently much slower. Is it safe to say performance tuning in QlikView should not be assumed to be that same as performance tuning in Qlik Sense?

    1. Hi Will,

      I expect the performance principles and metrics to be the same between Qlik Sense and QlikView in most cases. I’m surprised you saw slower performance with FieldValues. Qlik Sense does benefit from the performance improvements to load Resident (also in QV 12.20) so it may be time for me to retest. I’ll update the post if I find anything significant.

  14. How can i use above example when i want to find max date from a particular segment .. i want to use where condition based on Segment column of the same Table

    1. You cannot use FieldValues() to target values form a particular table or the relationship between multiple fields.

Leave a Reply to Martin Cancel reply

Your email address will not be published.