Super Fast Method to Retrieve QVD High Value

Delta Load, sometimes called Incremental Load, is the technique of pulling only changed or new rows from a database and then merging those rows with a master QVD. The Delta Load pattern follows these general steps:

1. Determine high water mark (“last reload”)
2. Build WHERE predicate in syntax of target DB.
3. SQL SELECT delta rows.
4. Merge delta rows with QVD.
5. If Deletes, INNER JOIN entire set of Keys from DB with QVD

The first step is to determine what cutoff value — delta point — do we want to pass in the SQL SELECT WHERE clause to identify new rows. This value is usually the highest value in the existing QVD.

The most robust and reliable method for step one is loading with max() from the existing QVD. For example:
LOADmax(LastModified) asMaxModifiedFROMmyqvd.qvd (qvd); 

This works well, but reading the entire QVD can take a very looooong time for a large QVD.

A much faster method is to aggregate the max value for the Delta rows only and then save that value on the first row of the QVD. In subsequent delta loads, only the first row of the QVD is read to get the value. This is extremely fast and is not effected by QVD size. For example:

Facts:
SQLSELECT * 
FROMmytableWHERELastModified>= ‘$(vMaxModified)’;

JOIN(Facts)
LOADmax(LastModified) asHighWaterValue
RESIDENTFacts;
// Do QVD Merge…and STORE Facts INTO Facts.qvd
Retrieve value later with:
 FIRST 1 LOADHighWaterValueFROMFacts.qvd(qvd

The “HighWaterValue” field will contain a relatively small number of values and will therefore have a negligible impact on the size of the QVD. You will of course have to create the field the first time before trying to load it with the “FIRST 1…”.

If you are using Qlikview Components (QVC) V8+ for delta loading, you can enable this high performance strategy by setting the variable:

SETQvc.Loader.v.StoreMaxModFieldValue = -1; 

QVC will automatically create a high water field named Qvc.MaxModFieldValue
and detect the availability of the field on subsequent loads. There is no need to pre-create the field before turning on this feature.

The technique is part of the Advanced Scripting material I’ll be presenting at the upcoming Masters Summit for Qlikview in Europe this October. I’ll be discussing more about Delta Loads, including strategies for merging large QVDs and tables with non-unique keys. Review the full agenda here and decide if the Masters Summit might help take your Qlikview skills to the next level.

Share

16 thoughts on “Super Fast Method to Retrieve QVD High Value”

  1. Interesting approach. The way I handle this is in my fork of the Qvc is by generating a separate qvd file to store the high watermark (e.g. Fact.delta.qvd). The existence of the delta file also serves as a way of forcing a full reload for a single table (i.e. if it’s not there, then a full reload is triggered for that entity).

  2. Hi Graeme,

    I’ve used the “side file” approach in the past but run into problems when a qvd is restored from backup or copied from another environment — and the proper side file is not restored as well.

  3. Hi Rob,

    We only ever move application code between environments (not config or data), and this is all done through a version control and deployment layer, so it has never been problem for us. Thanks for the post though – always interesting to see other approaches and techniques!

    Cheers,

    Graeme

  4. Smith, What kind of version control you use for QV ? I am intrested in understanding version control. Is it default QV server 11 or ?

  5. I found out that this:

    LOAD max(LastModified) as MaxModifiedFROM myqvd.qvd (qvd);

    is a lot slower than this:

    LOAD * FROM myqvd.qvd (qvd);
    and then a sort and a peek. the optimized load is a lot faster that a common load.

  6. What about:
    DATA:
    LOAD DISTINCT LastModified FROM myqvd.qvd (qvd);

    Load Max(LastModified) AS MaxLastModified RESIDENT DATA;

    A Load DISTINCT on a single fleld is still an optimized load so it’s quite fast

  7. Nice!

    In the second last paragraph, there is “There need to pre-create”

    Did you mean to say “There is no need to pre-create…” 🙂

  8. @David: A distinct load from a QVD may look optimized in the progress window, but there is overhead after the optimized load while QlikView is determining the distinct values. Note the hang after the rows stop loading, the fact that the full, not distinct, number of rows says it is loaded in the progress window, or check the log file for the time gap while QlikView is thinking.

    The absolute fastest way to get the distinct values out of a QVD is:
    LOAD FieldName
    FROM Sample.qvd (qvd)
    WHERE NOT Exists(FieldName);

    (This assumes there is not already a field of that name in memory, in which case you would need to alias as you load and utilize the second Exists parameter.)

  9. I have a couple of different approaches.

    If you can, if you ensure that the incoming data is sorted (order by X), then the last record will always contain the most recent date:

    Peek(‘trans_date’, -1, ‘DataTable’)

    I also have an approach using Script variables where I store the last run date in a variable and then calculate the current date in another and then use those in the SQL query. Once the script has completed, I update the last run variable with the value from the current date. Therefore it doesn’t rely on calculating the date from the data.

    Regards,

    Stephen

  10. @Stephen,
    You can store the QVD with the delta rows sorted descending modified. If you’re willing to assume the QVD is sorted this way, you can read just the first row with
    FIRST 1 LOAD Modified FROM (qvd);

    I’ve never thought this reliable enough as a general approach, hence the idea of storing an additional field.

  11. Como hago puedo mostrar un informe con las fechas actuales y otro por periodos cerrados, me explico, Hoy hago una consulta y me arroja unos resultados pero yo quiero saber como fue el cierre al 30 de septiembre.

  12. Como hago puedo mostrar un informe con las fechas actuales y otro por periodos cerrados, me explico, Hoy hago una consulta y me arroja unos resultados pero yo quiero saber como fue el cierre al 30 de septiembre.

  13. Hola, me gustaría que alguien me diga como haga para sacar de una tabla que en su nombre esta dos tipos de registro que quiero tener por separado. Me explico, tengo la tabla de usuarios, dentro de sus campos este el nombre, y dentro del nombre tiene dos tipos de registro que son coordinadores y Gestores quiero tenerlos en dos tablas diferentes. Como puedo hacer.

  14. Apologies for coming so late to this post but it may be worth considering making use of the FieldValue function to replace the slow

    LOAD max(LastModified) as MaxModified FROM myqvd.qvd (qvd);

    with something like this
    TempTbl:
    LOAD LastModified as FROM myqvd.qvd (qvd);

    Last:
    LOAD max(Temp) as Last;
    Load FieldValue(‘LastModified ‘,IterNo()) as Temp
    AutoGenerate 1 While Not IsNull(FieldValue(‘LastModified ‘,IterNo()));

    Let vLastModified = Peek(‘Last’,0,’Last’);

    DROP TABLES TempTbl, Last;

    It won’t be as fast as reading the first line of a qvd, that will be best for very large datafiles but if you are reluctant for any reason to follow that approach I’ve found that this method is about five times faster than the basic ‘load the max value from qvd’ method.

    All the best

  15. Andrew, the alternate is like below:

    Load
    FieldValue(‘FieldName’,RowNo()) AS DateKey
    Autogenerate FieldValueCount(‘FieldName’)
    ;

Leave a Reply to Rob Wunderlich Cancel reply

Your email address will not be published.