The Cost of Preceding Load

Summary: While preceding load is a powerful tool,  the current performance penalty may cause you to reconsider using it for anything but the smallest of data sets.

I’m a big fan of the “Preceding Load” feature in Qlik Script. This is the facility that allows us to stack a LOAD statement on top of a SQL statement, or stack two or more loads to simplify coding.

I’ve written about the beauty of  preceding load in 2009 and 2014   HIC praised the feature in the Qlik Design Blog in 2013.

I’ve generally thought there is no downside to using  Preceding Load. I believed preceding load would always be faster than multiple load resident — even when presented with examples to the contrary (credit  to Sandro Krumbein, Bill Markham, others).

When I first tested the performance of preceding load in QV8 and QV9, I detected very little overhead.  With  QV11 and QV12, the penalty (for LOAD on LOAD) is extremely large, such that I currently avoid using preceding except in small data sets.

Let look at a simple test case, that of creating an additional field when loading a QVD.  Admittedly thiis is not a good candidate for preceding load, but it should be a viable test case.

Creating the field without preceding:

LOAD *, A&B as B2 FROM data.qvd (qvd)

Creating with preceding:

LOAD *, A&B as B2; LOAD * FROM data.qvd (qvd)

Both statements will result in a non-optimized load.  Here is time in seconds to perform each statement in QV version 9,  along with an optimized load for  comparison.

The preceding load takes 28% longer, a significant amount. However, back then I was more worried about the loss of the optimized load. That was a difference worth caring about!

Now lets add in the test results for versions 11 & 12.

Overall, a big improvement in total time vs V9, especially in the non-optimized load. But a significantly larger penalty when using preceding. Why?

Before I go on, I realize you may be wondering about the performance of a LOAD on SQL like:


My testing show this incurs approximately a 4% penalty, so not to worry for LOAD/SQL.

Back to LOAD on LOAD from QVD or Resident.  Wouldn’t it be great if we could spend only 4% more for the preceding, instead of the %230 in V12 above?

Where is the extra time going? I’ll simplify the test to create no extra fields. Just a

LOAD * FROM data.qvd(qvd);

For comparison, I try 1, 2 and 3 “LOAD *;” statements stacked on LOADs from QVD and Resident.

I see a large increase in the first preceding, followed by smaller increases for each subsequent LOAD.  The numbers suggest, contrary to popular teaching, that a series of resident loads may be faster than preceding load.

Where is that time going? Let’s look at IO and CPU counters for each each test.

The additional time all seems to be CPU related.  Interesting to me, the “Read Operations” count for the QVD Preceding seem to indicate that QVD is actually read using the optimized block IO technique.

What’s happening with that extra CPU time? Is it required or is it something that can be improved? I’ll try to ask R&D folks this question at the upcoming  Qonnections May 2.

In the meantime, let me know if you’ve seem similar or different results in your load scripts when using preceding load.



16 thoughts on “The Cost of Preceding Load”

  1. Hi Rob,
    Same here: currently developping a QV app (v 11 SR13) in a PoC. Loading 6m records. Initial load from QVD: ~1:30 min. Adding preceding load incl. two new fields: ~8 min. Went back to store in-memory table / read again from QVD instead of preceding load. Way faster, still faster than resident load. However, for small datasets such as calendar, I love and keep using preceding load.

  2. Rob – just an observation here — I find it interesting in your testing successive 1, 2 and 3 preceding loads that it appears to scale linearly. That is, there is a big penalty for the first preceding load, then a much smaller penalty for each subsequent preceding load.
    I have no idea what this means “under the hood” but it does strike me as unusual.
    Great post! Thanks,

    1. What is also strange is that the penalty on the first preceding load is so much higher on the resident load. The loads are quite similar and on 2nd and 3rd it is just marginally higher.


  3. Rob, we have observed the same. With small data, preceding loads may not be an issue, and worth the tradeoff for simpler, more readable code, but for large data, we have been advising storing complex or repeated code snippets in variables, then using $-expansion in the load itself to keep a script readable while eliminating the need for a preceding load.

  4. Hi Rob,

    Great post, i think is very useful to remember people about the efficiency of QV applications.

    Do you have the results for preceding load in cases where its impossible to calculate fields on first load? I’m thinking in cases like when you needed an “Order by” in table for Previous() Function or Peek.

    In this case the test will be something like this:

    1- LOAD *, A&B as B2 FROM data.qvd (qvd);
    2-Temp: LOAD * FROM data.qvd (qvd);
    Temp2: LOAD *, A&B as B2 Resident Temp;
    DROP TABLE Temp;

    Thanks in advanced!

    Best Regards,
    João Gonçalves

  5. Great!!! It was about time!!!
    Are we in a dismantling myths era.
    One thing is sure: Complicating a hell the thing, saves your job for a while more. Just place incomprehensible code as much as you can, like adding 3 and taking 3 away.
    I’m a simple person but it’s impossible to survive nowadays being that way. Complicate, complicate and complicate. Don’t they just sell a telephone, they sell a telephone full of applications that you don’t use, you don’t know what they are for and most of it, you can’t uninstall, ha, ha, ha. That’s the point. 100 million copies sold in a weak. With Qlik is the same. It used to be a simple application. That was the kick ass of it.
    I’ve said.

  6. Thanks Rob! I just cut 25% off my load time converting preceeding loads (QV11) to stacked resident loads.

    Load * from qvd;

    a as AAA,
    b as BBB,
    resident T1;

    …resident T2;
    …resident T3;

    drop tables T1,T2,T3,T4;
    rename table T4 to T1;

  7. Hi Rob,

    I’ve found the same issue about a half a year ago when I was transforming 4.5b of rows.. I haven’t tested impact in numbers, but since that time I’m avoiding Preceding load.


  8. This was somewhat of a revelation for me a few years ago.
    I found it quite counter-intuitive as I would not have assumed that iterative resident / store-load transformations would be quicker at all.
    Even though its a bit messier to have these various Tmp tables the speed saving is more than substantial enough to justify the cost!

  9. Yes I can confirm I see the same thing i.e. preceding loads being substantially slower than non-preceding. On the CPU usage front the difference was highlighted when I compared a LOAD RESIDENT to a simple LOAD * preceding load. The LOAD RESIDENT did a good job of multi-CPU i.e. the work was spread across many CPUs (in fact on my 32CPU machine it was up there at 24 CPUs or so) but when I added in a simple LOAD * preceding LOAD the CPU usage went to at most 2 CPUs out of that 32. I can only deduce that ‘outer most’ (first to execute) LOAD in the preceding LOAD gets given one CPU to get its data from its source and it starts to pass it’s rows to the next executing phase in the preceding LOAD and that gets the other CPU. Pure conjecture on my part but it has made me far more aware of CPU usage and multi-CPU capability of certain operations or functions.

    1. Oh I should have added the lack of multi-CPU-ability for preceding LOADs explains why there is a less of a performance hit noticed on a LOAD * attached to SELECT SQL statement. Since SELECT SQL statements tend themselves to a single CPU operation then preceding LOADs has limited impact on them.

Leave a Reply

Your email address will not be published. Required fields are marked *