Summary: While preceding load is a powerful tool, the current performance penalty may cause you to reconsider using it for anything but the smallest of data sets.
I’m a big fan of the “Preceding Load” feature in Qlik Script. This is the facility that allows us to stack a LOAD statement on top of a SQL statement, or stack two or more loads to simplify coding.
I’ve generally thought there is no downside to using Preceding Load. I believed preceding load would always be faster than multiple load resident — even when presented with examples to the contrary (credit to Sandro Krumbein, Bill Markham, others).
When I first tested the performance of preceding load in QV8 and QV9, I detected very little overhead. With QV11 and QV12, the penalty (for LOAD on LOAD) is extremely large, such that I currently avoid using preceding except in small data sets.
Let look at a simple test case, that of creating an additional field when loading a QVD. Admittedly thiis is not a good candidate for preceding load, but it should be a viable test case.
Creating the field without preceding:
LOAD *, A&B as B2 FROM data.qvd (qvd)
Creating with preceding:
LOAD *, A&B as B2; LOAD * FROM data.qvd (qvd)
Both statements will result in a non-optimized load. Here is time in seconds to perform each statement in QV version 9, along with an optimized load for comparison.
The preceding load takes 28% longer, a significant amount. However, back then I was more worried about the loss of the optimized load. That was a difference worth caring about!
Now lets add in the test results for versions 11 & 12.
Overall, a big improvement in total time vs V9, especially in the non-optimized load. But a significantly larger penalty when using preceding. Why?
Before I go on, I realize you may be wondering about the performance of a LOAD on SQL like:
LOAD *; SQL SELECT * FROM ...;
My testing show this incurs approximately a 4% penalty, so not to worry for LOAD/SQL.
Back to LOAD on LOAD from QVD or Resident. Wouldn’t it be great if we could spend only 4% more for the preceding, instead of the %230 in V12 above?
Where is the extra time going? I’ll simplify the test to create no extra fields. Just a
LOAD *; LOAD * FROM data.qvd(qvd);
For comparison, I try 1, 2 and 3 “LOAD *;” statements stacked on LOADs from QVD and Resident.
I see a large increase in the first preceding, followed by smaller increases for each subsequent LOAD. The numbers suggest, contrary to popular teaching, that a series of resident loads may be faster than preceding load.
Where is that time going? Let’s look at IO and CPU counters for each each test.
The additional time all seems to be CPU related. Interesting to me, the “Read Operations” count for the QVD Preceding seem to indicate that QVD is actually read using the optimized block IO technique.
What’s happening with that extra CPU time? Is it required or is it something that can be improved? I’ll try to ask R&D folks this question at the upcoming Qonnections May 2.
In the meantime, let me know if you’ve seem similar or different results in your load scripts when using preceding load.