“Fastest” Method to Read max(field) From a QVD

Gather any group of enthusiasts together over a pint after work and inevitably there will be debate over what I call a “Pub Question”. Comic Fans will argue who would win in a fight between Batman and Spiderman, Web Programmers will debate the merits of Java vs .Net. For a group of Qlikview Script developers, the question may be “what’s the fastest script to get max value of a field from a QVD?”.

Qlikview has virtual Pubs in the Community Forums and blogs where these questions get raised. In a recent blog post http://qlikviewnotes.blogspot.com/2013/08/super-fast-method-to-retrieve-qvd-high.html I got some comments about this very question, including a couple of methods I had not tried before.

Let’s dispense with the Resident Table first. The fastest method for a Resident Table is loading the FieldValue()s.

LOAD Max(Id) as MaxId;
LOAD FieldValue(‘Id’, recno()) as Id
AUTOGENERATE FieldValueCount(‘Id’); 

This is much faster than loading From Resident like this:

LOADmax(Id) Resident data; 

Here’s a comparison of the two:

For the test, I use a table with 10M rows and 7 fields. I vary the cardinality of the “Id” field from 1:1 to 100K:1. The x-axis of the chart shows the factor and the number of unique values (rows / factor).

The left side of the chart represents the maximum unique values at 1:1. The rightmost side of  the chart is least unique with only 100 values.  These ranges represent some real world cases such as:
1:1 — Primary Keys
2:1 — Timestamp values
1000 values — Date values

For a 1:1 case, there is not much difference between the FieldValues and Resident methods. However, as the number of values decrease, the FieldValue method follows in a linear trend. The Resident stays fairly constant as the same number of rows are being read.

Now on to the answering the “Fastest” when loading from a QVD. The answer, like all good Pub Questions, is “it depends”.  I ran timing tests on the following methods:


LOADmax(Id) FROM (qvd); 

LOAD * FROM (qvd);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
LOADmax(Id) Resident table;
LOAD Id FROM (qvd);
LOAD max(FieldValue(‘Id’, recno())) 
AUTOGENERATE FieldValueCount(‘Id’);
LOAD Id FROM (qvd);
LOAD max(IdResident table;

Everything but the first case will be an Optimized QVD Load.

Here are the test results.

Surprisingly, the QVD only case is the winner for Factors 1-3, even though it is an un-optimized Load. This is not so surprising when you consider how data is stored in a QVD. The field values are stored in a symbol array, just like a qvw in RAM. Reading a single field would require only reading in those symbols. Combining the max() logic with this step would be efficient.

Note that the LOAD RESIDENT operation doesn’t benefit much from a deceasing number of values. At the 1:1 end, all techniques are processing the same quantity of 10M values. As we move to the right, LOAD RESIDENT continues to process 10M values (rows) while the other techniques benefit from lessor quantities.

Let’s drill in a bit. I’ll exclude the QVD-Field/Resident test (for scale) and narrow the Factors to the middle stuff, 4 – 1000.

And here’s the total time for those selections.

So…if I had to pick the “fastest”:
– For 1:1 cardinality, like Primary Keys or revision numbers, direct from the QVD is fastest.
– For most other general cases, Loading just the field and then processing Fieldvalues is fastest.

Of course, there are a number of factors beside the data profile that can impact the results. Disk speed, number of cores, etc. I’d love to hear if you have different experiences.

My colleagues at the upcoming October 9 Masters Summit for Qlikview in London and Barcelona may have some thoughts of their own on this question. In addition to core topics, Oleg is presenting “Performance Tuning” and Bill is presenting “Tips & Tricks”. Both sessions should provide some practical pointers for speeding up your apps. There is still time to register for the Summit. If you attend, perhaps we can continue this discussion at an evening “Pub Session”.



16 thoughts on ““Fastest” Method to Read max(field) From a QVD”

  1. Hi,

    I’ve tried the example with QVD-Field/FieldValues and I can’t get it to work. It only seems to generate a ?? value. Can you post and example file?


  2. Hi Rob,
    thanks for this interesting test application. If the data are ordered like in this case (Id=RecNo) I guess the fastest option should be QVD-Field/Peek.. 😉
    Best regards,

  3. Btw. there is a bug in the QVW download, the field name
    “Test” should be “TestSource”,
    “Factor” should be “TestFactor”
    to get the UI working right:

    SUB End
    ‘$(vTestFactor)’ as Factor,
    ‘$(vTestSource)’ as Test,
    $(vTestStart) as TestStart,
    GetTime() as TestEnd
    AutoGenerate 1

  4. Hi Ralf,
    A QVD, due to delta loads, is not likely to be ordered by Id. However, that the delta rows would be on top and sorted DESC in which case the fastest option may be:
    FIRST 1 LOAD Id…;

    P.S. I thought a reload would have taken care of that UI fieldname change, but I’ll upload another copy along with some more tests.

  5. Hi Rob, I observed some times FieldValue & IterNo gave good performance over Resident:

    LOAD FieldValue(‘ID’,IterNo()) AS TEMP
    Autogenerate 1 while NOT IsNull(FieldValue(‘ID’,IterNo());

  6. Hi Rob,
    I simply can’t see where I’m going wrong with your method for finding the largest record of a field in a resident table. I’ve given up trying in a large real life example and been reduced to trying a simple INLINE table but still my script is wrong.
    In this example I expect to obtain a table tblMaxNum with a single field MaxNum of value 45 but instead I get a value of ‘?’.

    If you can spare a minute or two to relieve my torment I will be forever grateful.


    LOAD max(FieldValue(‘Num’,RecNo())) as MaxNum
    AutoGenerate FieldValueCount(‘Num’);

    I can get the suggestion of the previous poster (Sree M, thanks) to work but not yours
    LOAD FieldValue(‘Num’,IterNo()) AS TEMP
    Autogenerate 1 while NOT IsNull(FieldValue(‘Num’,IterNo()));


    Andrew Walker

  7. Hi Andrew,
    You couldn’t get my example to work because the example was wrong. The preceding load got lost in a migration between hosts. Sorry for that. It’s been corrected and here’s how your example should look.

    LOAD max(Num) as MaxNum;
    LOAD FieldValue(‘Num’,RecNo()) as Num
    AutoGenerate FieldValueCount(‘Num’);

  8. Hi Rob –
    I’ve tried the suggested solution I must say it’s critically fine. Compared with the “Resident” method it reduced the elapsed time by 99% or more. Objective was to find the max date on a base table of approximately 3.5 MIO rows.
    Really fine, thank you!

  9. Hi Rob

    Implemented the following:

    LOAD Id FROM data.qvd(qvd) WHERE NOT EXISTS(Id);
    LOAD max(Id) Resident Max;

    And had a 32.5% improvement compared with:

    LOAD Id FROM (qvd);
    LOAD max(FieldValue(‘Id’, recno()))
    AUTOGENERATE FieldValueCount(‘Id’);

    Thanks 🙂

  10. Hi Rob –

    Is it possible to implement this method when we have to find max by another field? For instance, if I have to get max(Sales) by region. Is it doable using above method?


    1. Hi Sunny,

      The fieldvalues() method would not work when grouping by another field. FieldValues() works on the symbol tables. The linkage between a “Sales” values and a “Region” values require that you go through the data by row,

  11. Hi

    I know this is an old topic. But in relation to Sunnys question – have you done some simarlar testing on best performance on max values on dimensions.

    I have a data set on approximately 8 billions rows of changing data – and need to find the min and max values for 4 millions different dimensions. So if someone already have done some simalar testing im very interested to learn about the findings ?


Leave a Reply

Your email address will not be published. Required fields are marked *