Category Archives: Scripting

QVD Questions and Answers

On Feb 2 I presented a live webinar titled “Understanding & Best Practices with  QVD Files”. You can download the slides as a pdf. 

Many questions were submitted during the presentation, too many to answer at that time. I’ve tried to answer those questions and others below. If you have further questions about QVDs, post a comment here or  in the QlikCommunity Forum. 

Q: ­Does the QVD data get stored in an RDBMS like Oracle, or is it in a file system? ­
QVD files are stored in the file system.
Q: ­what is the compression factor for QVD’s­?
QVD files are stored uncompressed. A QVD contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into RAM, “ready to go”. Because QVD is the RAM image, there is no compression.
Q: ­Can we trace back QVD to its source?
As of QV10SR2, the XML header in a QVD file contains the name of the QVW that created the QVD as well as file sources and database connections/SQL statements.
Q: ­Why is sorting not possible while loading QVD?­
Sorting (ORDER BY) is only possible with Resident (already in memory) files. Sorting is not possible when reading from files.
Q: ­Could you go over again the concept of “forcing” un-optimized load for the MAPPING function, respective to the qvd?­
MAPPING tables may be loaded from a QVD, but it must be an un-optimized load (this is sometimes called “unwrapping”).
MyMap:
MAPPING LOAD F1, F2 FROM sometable.qvd (qvd);
The above mapping table will be created but it will appear to be empty when used in MAP USING or ApplyMap().  No error, just no resulting mapping.  One workaround is to create a condition that will cause an un-optimized load.  We want all the rows, so we create an always-true condition that will return all rows.
MyMap:
MAPPING LOAD F1, F2 FROM sometable.qvd (qvd)
WHERE 1=1;

Note: In QV10+, the MAPPING prefix will trigger an unoptimized load. The 1=1 trick is not necessary.

A corollary to this is that the target of a mapping operation cannot be an optimized QVD.
MAP Country USING MyMap;
// Optimized load, Country will not get mapped.
LOAD Customer, Country FROM customer.qvd (qvd);
Q: ­The use of the Where 1=1 is something that will be good for mapping fields in the future or is possible that qlikview will determine that where 1=1 will allow optimization?­
Good question. We use WHERE 1=1 to force the un-optimized load required by MAPPING LOAD. I’m hopeful that if Qlikview were changed to consider 1=1 as an optimized load, they will also recognize that MAPPING LOAD should be non-optimized.
(Note: In QV10, MAPPING LOAD is automatically non-optimized).
Q: ­How would you handle the need to load multiple models (ie multiple qvws)?? I don’t think you can do multiple binary loads, so what do you recommend.­
You can generate QVDs from each model and then load all the QVDs to form the larger model.  You can generate all QVDs from a qvw with a simple loop. You can add this code to each of your model qvws.
FOR i = 1 to NoOfTables()
  LET vTableName = TableName($(i)-1);
  LET vOutfile = ‘$(vTableName).qvd’;
  STORE [$(vTableName)] INTO [$(vOutfile)] (qvd);
NEXT i
Q: ­How are QVD refresh scheduled?­
QVDs are created by script in a QVW executed by the reload process. Schedule the reload as you would the reload of a user facing qvw, using the Qlikview Enterprise Management Console (QEMC) or a batch file.
Q: ­Is QVD Optimized load really worthwhile since it is fairly limited? In other words, should we load data to the memory striving for QVD optimized and then work with the memory tables within the script?­
Optimized vs non-optimized load has two impacts: Load duration and Server RAM usage. If your application is relatively small or you do not have concerns about the impact, don’t spend time trying to maintain an optimized load. Some of the script techniques used to maintain optimized can make your script harder to follow.
If, for a given document, you have concerns about load duration or RAM usage, then making the effort to maintain an optimized load would be worthwhile.
Q: ­Can a QVD be accessed from a AS400 DB2 database to get some data?­
Nothing but Qlikview can read from QVDs, so no, DB2 cannot read directly from a QVD.
In the same script that creates the QVD with the store statement:
STORE mytable INTO mytable.qvd (qvd);
You can also create a CSV copy for other consumers:
STORE mytable INTO mytable.csv (txt); 
The csv file can be read by any number of programs, including an ODBC text driver or a bulk database loader. You can use QV to do the ETL and then push csv files back into a Data Warehouse, using something like SQL Server DTS or other data pump.
Q: ­I’m pulling data from a database over a slow WAN link.  Would using a qvd speed this up?  If so, would the qvd file reside on the same side as the database or at the end of WAN Link (client side)?­
Using QVDs could speed up your overall process by allowing multiple reloads to load from the qvd instead of going to the database over the slow WAN link. The QVD should live at the client end of the link – where the qvw is reloading.
Q:  If the data source is constantly changing (such as portfolio management software) can we refresh qvds frequently? Will this overburden the process?
QVDs may be refreshed frequently. Exactly how frequently depends on your data volumes and architecture. Refreshing every 30 minutes is common, and I have seen intervals of 5 minutes.  Frequent refresh of large volumes usually requires incremental load, which is covered in the Reference Guide and the Forums.
Q: ­If add the BUFFER command before each load statement pulling from DBMS, the first execution pulls from the DB but all after are incremental loads pulling from a file system created batch of QVDs?­
The BUFFER prefix does not provide incremental load when loading from a DBMS.  Subsequent reloads will load from the buffered file system QVD, but new rows will not be fetched from the database.
When used with a load from txt files, BUFFER will provide automatic incremental load. Subsequent reloads will add new data from the file to the buffered QVD.
Q: I am running SBE Server so documents are reloaded right from the Documents folder.  What is your recommendation for location of the QVD generator documents?  In other words, do you place them in the Documents folder alongside your production QVW’s? 
I recommend putting the QVD generators in a separate “Loaders” folder. Make this a mounted folder in QVS and schedule reloads as needed. Use NTFS permissions to hide the folder from standard AP users
A number of questions were asked about the QVX format. I haven’t had much experience with QVX yet. Rob Patterson has indicated he will schedule a QlikLearn webinar specifically on the topic of QVX.
Q: Does ­QVX also have two types of load, optimized and not optimized?­
No optimized load only applies to QVD.
Q: ­What are the other differences between QVD and QVX?­
QVD is a proprietary file format provided by QlikView for storage. Only QlikView software can read and write to QVD files.
The QVX is in an open format performance file for storage of QlikView data. A customer or third party can create QVX files on any platform, without needing Qlikview software.
QVD files will typically load faster than a QVX file.
Q: ­Is QVX used as a source to other source systems or is it used to pull the data from source systems which has no ODBC provider?­
The use cases for QVX are still being discovered, and I’m sure we’ll see some interesting uses.  The scenario I currently understand is to provide data to Qlikview when there is no ODBC provider.
Q: ­How can I create a QVX?
Q: ­How do you write out to a QVX?­
Q ­How can you read QVX from other software than Qlikview .­
Documentation of the internal QVX format is available in the Qlikview SDK. The SDK can be installed from the Qlikview Server installation package. Also look for examples in the “Share Qlikviews” section of QlikCommunity.
You can also create a QVX with a script STORE statement:
               STORE mytable INTO mytable.qvx (qvx);
This is useful to generate a sample QVX for examination or testing.
-Rob 
Share

Incremental Load using SQL Server “timestamp” Data Type

Incremental Load (extracting only new or changed rows from the database) requires a table column that identifies when a row has been updated. This is usually a datetime column like “LastUpdate”. The script extracts the max timestamp from the existing QVD and uses it to create a select predicate like:

WHERE LastUpdate >= ’01-20-2010 13:55:01′

If the database is Microsoft SQL Server, you may have another option for identifying updated rows. A table column of type “timestamp” is incremented whenever a row is changed. The name “timestamp” in somewhat confusing, as this column does not contain time values. Here is the description from the 2005 Reference:

“timestamp is a data type that exposes automatically generated, unique binary numbers within a database. timestamp is generally used as a mechanism for version-stamping table rows. The storage size is 8 bytes. The timestamp data type is just an incrementing number and does not preserve a date or a time. To record a date or time, use a datetime data type.”


In Sql  Server  2008, the type “rowversion” was introduced as an alias for “timestamp”. Rowversion is preferred for 2008 and forward.


If your table does not already have a timestamp column, you can add one with an ALTER DDL statement, For example, to add a timestamp column named “Revision” to a table named “Orders”:


alter table Orders ADD [Revision] [timestamp] NOT NULL;


The Revision column will be returned by default as a hexadecimal string, so it’s easiest to convert to an integer before storing in QV. 

SQL SELECT 
    OrderID, etc, 
    cast(Revision as bigint) as Revision
FROM Orders …

In subsequent incremental loads, add a predicate to select only rows greater than the last revision. 

// Select the max value from the qvd
tempmax:
LOAD max(Revision) as Revision FROM myqvd.qvd (qvd);
LET vMaxRevision = PEEK(‘Revision’);
DROP TABLE tempmax:

SQL SELECT 
  OrderID, etc,
  cast(Revision as bigint) as Revision
FROM Orders
WHERE Revision > $(vMaxRevision);

I find the timestamp value, when available, to be easier to use than a datetime column. It’s just a numeric, so no literal formatting is required. 

Because it’s a precise and unique value, you avoid the “always one row” problem. When selecting from a datetime, you usually have to specify “>=” because a datetime is not a unique value. This means that a select will return at least one row, even if there were no real updates. 

With timestamp, you can specfy “>” which will return zero rows if there were no updates. Knowing that you had no updates can be useful in controlling the execution of further processing steps. 

Rob 

Share

The Now() Trap

The Now() function operates differently in script and charts. This post will highlight one difference that has tripped up even experienced developers.

Our example script requirement is to extract rows of data where the “TransactionDatetime” field value is within the last 24 hours. We’ll use the “Now()” function to establish the current time. Here’s the doc from the QV Ref Guide.

now( [ timer_mode ] )

Returns a timestamp from the system clock. The timer_mode may
have the following values:
  0 Time at script run
  1 Time at function call
  2 Time when the document was opened

Default timer_mode is 1. The timer_mode=1 should be used with
caution, since it polls the operating system every second and hence
could slow down the system.
 
Which timer_mode option do you choose? Hmm… you want 24 hours prior to the script execution so it seems like “0” is the best option. Plus there is that scary warning about “1” slowing down the system.

The integer 1 represents a day — 24 hours — so you code a where clause in your LOAD like:
 
WHERE TransactionDatetime >= now(0) -1
 
During development, you reload several times. You examine the data selected and it seems to be working — you are getting data from the last 24 hours. It’s 4:00pm Tuesday afternoon and you promote this to the server and set a reload schedule of daily at 8:00am. A test run on the server shows the data selection is working as planned.
 
The document reloads on schedule at Wednesday 8:00am. A review of the app shows data going back to Monday at 4:00pm.  After Thursday morning’s reload the data range is Tuesday 8:00am to Wednesday 8:00am. What happened to the rest of Wednesday and all of Thursday?
 
The value for Now(0) is set at the end of script execution. When used in script, Now(0) returns the end time of the previous script execution, not the time of the current script execution.
 
So on Thursday morning Now(0) returns Wednesday 8:00am — the end of the last execution. That is not what we were looking for.  
 
In Charts, Now(0) returns the end of the latest script execution, which would be Thursday 8:00am.
 
Now(1) always returns the time when the function is executed — the “current” time. During development when reloads are frequent, script Now(0) is usually pretty close to Now(1) and you may not notice the difference. But when a document goes on a daily reload schedule, Now(0) is usually an entire day behind what you want!
 
The correct choice in script is generally Now(1), which returns the time when the function is actually executed, not a time related to previous reload.
 
To Summarize, use Now(1) in script,  Now(0) in charts.

If you need to establish a deterministic (consistent throughout the entire script) value for “Now” in script, set a variable at the beginning:
 
LET vScriptStart – Now(1);
 
Then use vScriptStart as the reference point in your script. It may also be used in charts as well.

-Rob

Share

Use cases for Generic Load

The Qlikview  “Generic Load” is not frequently used and is therefore sometimes forgotten. Generic Load has some interesting applications and can be a useful item in your script toolbag.

Generic Load  is the complement of “Crosstable Load”. In a loose sense, you could say that aCrosstable Load creates a Generic table and  Generic Load creates a Cross table.

Consider this table which contains a separate row for each Phase of a Project.

Now suppose you want to flatten this table to a single row per Project. You have a variable number of Phases per Project. The resulting data model should look like this:
The model above can be accomplished with a two GENERIC LOADs against the ProjectTable:
GENERIC LOAD Project, ‘Start Phase ‘ & Phase, StartDate
RESIDENT ProjectTable;

GENERIC LOAD Project, ‘End Phase ‘ & Phase, EndDate
RESIDENT ProjectTable;

Generic Load creates additional Qlikview tables. The additional tables cannot be avoided by combining a CONCATENATE or JOIN prefix. In the next example I’ll offer a technique to consolidate the tables.

Here’s another application of Generic. Consider this example table.

Suppose you want to generate flag fields for each of the possible order statuses? The flags could be created with a single Generic Load.

Flags:
GENERIC LOAD Order, ‘Status_’ & Status, 1
RESIDENT OrdersTable;

The resulting data model now contains flags for each Order.

As mentioned previously, Generic Load creates additional tables. The table view after the above Generic Load is:

You can see Generic makes a new table for each new field it creates. That’s fine if it doesn’t cause synthetic keys or other problems. If you want to merge the Flag fields into the fact table (OrdersTable), you can do it after the Generic Load with a bit of code like this:
FOR i = NoOfTables()-1 to 0 STEP -1 
  LET vTable=TableName($(i)); 
  IF WildMatch('$(vTable)', 'Flags.*') THEN 
    LEFT JOIN (OrdersTable) LOAD * RESIDENT    [$(vTable)]; 
    DROP TABLE  [$(vTable)]; 
  ENDIF 
NEXT i


Here’s the table view after the Joins.

Generic load is not an everyday tool, but can prove useful in specific situations.

The qvw examples used in this post may be downloaded from here.

-Rob Wunderlich
Share

Best way to count keys? Don’t.

I was recently reviewing a problem where a chart Count() function produced differing results between QV 8.5 and 9. The field being counted was a key field. Counting a key field without using DISTINCT, especially a one-to-many key, can produce ambiguous results and should be avoided.

The recommended approach is add a counter field and sum() that field. For example, in an Orders table, add the field:

1 as OrderCounter

and then count Orders with:

sum(OrderCounter)

Yes, count(OrderCounter) will work as well. The Qlikview literature still states that sum() is faster and preferable to count(). John Witherspoon recently showed me some tests that demonstrate count() being faster than sum() in Version 9, so it’s possible that recommendation should be examined if you are working on a very large application.

-Rob

Share

Simplify with Preceding Load

Most QV script developers are introduced to “preceding load” as a LOAD that precedes an SQL SELECT. But a LOAD may also precede another LOAD, which can be a very useful tool.

Let’s review a typical preceding load.

Table1:
LOAD Customer, Sales, today(1) as LoadDate ;
SQL SELECT Customer, Country, Sales FROM SalesResults ;

  • The absence of a “FROM” or “RESIDENT” clause in the LOAD is what makes this a “preceding load”.
  • The SQL SELECT will be executed first. The results of the SELECT will be used as input to the LOAD statement.
  • Table1 will have three fields — Customer, Sale, LoadDate.
  • The Field “Country” will not be present in Table1 because “Country” is not repeated on the LOAD statement.
  • The field “LoadDate” does not exist in the SQL SELECT and is added by the LOAD.



Let’s look at an example of where preceding load can be useful. When loading data, you may need to use expressions to parse or cleanse data. For example, extracting a timestamp from a string in a text file.

Table1:
LOAD
timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

What if you want additional time dimensions from the data? You could add expressions like:

date(date#(mid(@1:n, 3, 6)) as EventDate
month(date(date#(mid(@1:n, 3, 6))) as EventMonth

The script would soon get messy with “paren-disease” and become harder to maintain. Preceding Load to the rescue.

Table2:
LOAD *,
floor(EventTime) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour
;
LOAD
timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime, mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

  • The syntax is greatly simplified by reusing the “parsed once” EventTime.
  • Table2 will contain six fields: Event, EventTime, EventDate, EventMonth, EventYear, EventHour.
  • The “*” in the top load includes the fields emitted by the bottom load — EventTime & EventTime.

Preceding Loads may also be stacked more than two deep as in this example.

Table2:
LOAD *,
if(match(EventMonth, ‘Aug’, ‘Dec’) OR weekday(EventDate) > 5, ‘Holiday’, ‘Standard’) as Rate;

LOAD *,date(floor(EventTime)) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour;

LOAD timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

Preceding load is a useful tool to simplify the syntax of your script and make it easier to maintain.

-Rob

Share

Understanding Join and Concatenate

The Qlikview script functions JOIN and CONCATENATE can sometimes be used to tackle the same problem, but there are important differences that should be understood.


Examine the sample tables below. Note that they share one common field name, “Key”. Also note that Table1 has a Key value “3” that is not present in Table2.

JOIN will combine rows where the Key value matches. The keyword OUTER will also retain rows that do not match rows in the other table. Here’s what the merged table will look like after an outer join.

OUTER JOIN (Table1) LOAD * RESIDENT Table2;


Values A1 and C1, which were in different tables, now occupy the same row in the result table. The row with Key 3 has missing values for C & D, because there was no matching Key in Table2.


Creating a chart that uses “Key” for dimension will produce results similar to the Table Box above.



The important point is that values with the same Key value have been merged together into a single row. If value A1 is selected, note that values C1 & D1 remain associated (white). The set A1,B1,C1,D1 is indivisible.

Now let’s look at Concatenate. Concatenate appends the rows of one table to another. Concatenate never merges any rows. The number of rows in a concatenated table is always the sum of the rows from the two input tables. Here’s what our sample data will look like after Concatenate.

CONCATENATE (Table1) LOAD * RESIDENT Table2;

Rows with like Key values are not merged together. The rows from Table2 are simply appended to Table1. Because the tables have different fields, rows will have null values for the fields from the “other” table.

If the data is used to build a chart that utilizes the common field “Key” as dimension, the chart looks just like the JOINed table.

Let’s make the selection “A1” in Field A and see it’s impact on our visible charts and tables.

When A1 is selected, the association to C1 & D1 is lost and C/D values become null in both the Chart and Tablebox. We cannot select both A1 and C1. This is a different result than the JOINed example.

Let’s consider a more realistic example where we may choose between JOIN and CONCATENATE. Consider the two tables below. Note that only one BudgetAmount row is present for each Region-Year combination. In the Sales table, the SalesAmount is broken down by Department within Region.

If we load both tables we can produce a chart using expressions like =Sum(BudgetAmount).

The Budget and Sales values have been summed correctly.

We then notice that we have an undesirable synthetic key, created by the Budget and Sales tables sharing the Year and Region fields. One approach to eliminate the synthetic key would be JOIN or CONCATENATE. But which one in this case?

Let’s try JOIN and see what the Chart looks like.

OUTER JOIN (Budget) LOAD * RESIDENT SALES


The summed Budget numbers are incorrect!

A look at raw data of the joined table will identify the problem. The JOIN repeated the BudgetAmount value on each Department row.


Let’s try with CONCATENATE.

CONCATENATE (Budget) LOAD * RESIDENT Sales;


The numbers are now correct and we’ve accomplished the goal of eliminating the synthetic key.

A peek at the data in the Concatenated table will make it clear why the chart is now correct. There is only BudgetAmount value or each Year-Region.

JOIN and CONCATENATE are both very useful and frequently used functions in Qlikview. It’s important to understand the differences between them.

-Rob

Share

BOQC: Flexible Interval Classification

Another post in the “Best of QlikCommunity” series.

In this QlikCommunity Forum thread the poster asked about using the class() function to create a dynamic dimension of 30 minute intervals in a chart. He wanted to format the class values as display friendly time ranges.

My solution would have been to use mapping to format the classes to the desired display format. However, this would have been a lot of data entry for 48 intervals in a day.

John Witherspoon posted a more elegant solution utilizing a simple expression.

dual(time(floor(timestamp, 1/48),’h:mm TT’) & ‘ – ‘ &
time(ceil (timestamp, 1/48),’h:mm TT’)
,floor(frac(timestamp),1/48))


Using John’s example, I was able to extend the idea to easily provide for a user selectable interval size.

Read the thread for details.

-Rob

Share

BOQC: ApplyMap instead of Join

Today marks my first blogging of “Best Of QlikCommunity” (BOQC) where I plan to highlight what I find to be particularly useful or interesting posts on the QlikCommunity Forums.

There are cases when the ApplyMap() function is a very useful alternative to Join. For some time I have noticed both Oleg Troyansky and Michael Nordstrom dropping this hint on QlikCommunity but I never quite understood the power of the technique until a post Oleg made today:
http://community.qlikview.com/forums/p/16017/62222.aspx#62222

The original forum question was how to multiply two fields from different tables to derive a new calculated field. The tables share a common key field.

I usually would have approached this with two Joins. That approach works, but sometimes I don’t really want my data model to reflect Joined tables. I just want to do the calculation.

If you want to see the ApplyMap() solution, read the post linked to above. The thread explains it better than me repeating it here.

Share

Using MapSubstring() to edit strings

The MapSubstring() function is a powerful alternative to using nested Replace() or PurgeChar() functions.

MapSubstring(), unlike it’s siblings ApplyMap() and Map, will apply multiple mappings from the mapping table. Here’s an example.

ReplaceMap:
MAPPING LOAD * INLINE [

char replace
)
)

,
/

] (delimiter is ‘ ‘)
;


TestData:
LOAD
*,
MapSubString(‘ReplaceMap‘, data) as ReplacedString

;
LOAD * INLINE [

(415)555-1234
(415)543,4321
“510”123-4567
/925/999/4567
] (delimiter is ‘ ‘)
;


In field “ReplacedString“, all the characters matching the first field of the map (“char”) are replaced with a backslash as shown in this table. This makes it ready for parsing with a function like SubField().

Another usage is an alternative to nested PurgeChar() to remove multiple characters. A blank is used as the mapping character. For example:

PurgeMap:
MAPPING LOAD * INLINE [

char replace
)
)

,
/

] (delimiter is ‘ ‘)
;



MapSubString(‘PurgeMap’, Data)
will produce results like this:

-Rob

Share