Incremental Load using SQL Server “timestamp” Data Type

Incremental Load (extracting only new or changed rows from the database) requires a table column that identifies when a row has been updated. This is usually a datetime column like “LastUpdate”. The script extracts the max timestamp from the existing QVD and uses it to create a select predicate like:

WHERE LastUpdate >= ’01-20-2010 13:55:01′

If the database is Microsoft SQL Server, you may have another option for identifying updated rows. A table column of type “timestamp” is incremented whenever a row is changed. The name “timestamp” in somewhat confusing, as this column does not contain time values. Here is the description from the 2005 Reference:

“timestamp is a data type that exposes automatically generated, unique binary numbers within a database. timestamp is generally used as a mechanism for version-stamping table rows. The storage size is 8 bytes. The timestamp data type is just an incrementing number and does not preserve a date or a time. To record a date or time, use a datetime data type.”


In Sql  Server  2008, the type “rowversion” was introduced as an alias for “timestamp”. Rowversion is preferred for 2008 and forward.


If your table does not already have a timestamp column, you can add one with an ALTER DDL statement, For example, to add a timestamp column named “Revision” to a table named “Orders”:


alter table Orders ADD [Revision] [timestamp] NOT NULL;


The Revision column will be returned by default as a hexadecimal string, so it’s easiest to convert to an integer before storing in QV. 

SQL SELECT 
    OrderID, etc, 
    cast(Revision as bigint) as Revision
FROM Orders …

In subsequent incremental loads, add a predicate to select only rows greater than the last revision. 

// Select the max value from the qvd
tempmax:
LOAD max(Revision) as Revision FROM myqvd.qvd (qvd);
LET vMaxRevision = PEEK(‘Revision’);
DROP TABLE tempmax:

SQL SELECT 
  OrderID, etc,
  cast(Revision as bigint) as Revision
FROM Orders
WHERE Revision > $(vMaxRevision);

I find the timestamp value, when available, to be easier to use than a datetime column. It’s just a numeric, so no literal formatting is required. 

Because it’s a precise and unique value, you avoid the “always one row” problem. When selecting from a datetime, you usually have to specify “>=” because a datetime is not a unique value. This means that a select will return at least one row, even if there were no real updates. 

With timestamp, you can specfy “>” which will return zero rows if there were no updates. Knowing that you had no updates can be useful in controlling the execution of further processing steps. 

Rob 

Share

6 thoughts on “Incremental Load using SQL Server “timestamp” Data Type”

  1. Great post Rob.

    One question, I’m not sure if your peek statement will work correctly.

    I would code like this:
    // Select the max value from the qvd
    tempmax:
    LOAD max(Revision) as Revision FROM myqvd.qvd (qvd);
    LET vMaxRevision = PEEK(‘Revision’);
    DROP TABLE tempmax:

  2. Rob,

    I’m fairly new to Qlikview, so forgive me if I’m way off base. If the datasource is DB2, and we have a timestamp on updated records in our qvd, does this incremental load solution extract that from the database and update the qvd’s records? or does it just load new records into the qvd?

  3. A top tip.

    By being able to use an auto generated field it avoids those cases where, say, a stored procedure fails to update the ‘Last Updated Date’.

    I had to use the cast on the Where clause also, e.g.

    SQL SELECT
    OrderID, etc,
    cast(Revision as bigint) as Revision
    FROM Orders
    WHERE cast(Revision as bigint) > $(vMaxRevision);

    Regards,

    HD

  4. Hi,
    currently i’m trying to implement incremental load in postgre but i’m having an error message “ErrorSource: Microsoft OLE DB Provider for ODBC Drivers, ErrorMsg: ERROR: syntax error at or near “6”;
    Error while executing the query”

    what i supposed to do?

Comments are closed.