All posts by Rob Wunderlich

Using the Google Chart API inside Qlikview

I recently had a requirement to create a heat map of the US States and set about exploring if there was an easier way than creating a QV scatter plot mapped over an image.

I found the Google Chart API. With some help from the QlikCommunity forum, I got a fairly pleasing result.

The Google Chart API is provided free by Google. You pass in an http request with parameters that describe the data and desired layout, and a chart image is returned. I won’t cover the details of the API, it’s well documented at http://code.google.com/apis/chart/. Rather, I’ll share my experience integrating it with QV.

Here’s a screenshot of my results. The map will update as selections are made in the Sales chart. The generated map is not clickable. It’s just a static image.

A working QVW of the above may be found in the Qlikview Cookbook at:
http://robwunderlich.com/Download.html

The map is a Straight Table with a single expression. The expression is the http://… string used to generate the map. The representation for the expression is set to “Image”. Thanks to Tom on the forum for showing me this technique.

The OnAnySelect document event is used to trigger a “showMap()” macro that creates the variables needed for the http string. In a production application, you would want to be more selective and use field level events on the fields relevant to the chart.

In addition to chart layout parameters, the http string contains two parameters that describe the data.

  • chld= provides the list of states
  • chd= provides the data values for the states

The States and Values are associated by ordinal position in the respective lists. I could not find a way to keep the lists in sync by using QV expressions alone. The solution was to use the Sales chart as my “data source”. The macro walks the rows of the table to build two variables — the State codes and the Sales values. Here’s a snippet of the macro code:

set obj = ActiveDocument.GetSheetObject("CH01")
' Collect the locations
locations = ""
for RowIter = 1 to obj.GetRowCount-1
locations = locations & obj.GetCell(RowIter,0).Text
next
ActiveDocument.GetVariable("vValues").setContent values, false

Another area where VBScript was useful was in encoding the data values as specified by the API. I chose the “simple encoding” method. The Sales values are translated to relative values within a range of single characters defined by the API. The doc http://code.google.com/apis/chart/#simple provided a javascript encoding example which I converted to VBS.

The encoding algorithm requires that the maximum value of the dataset be known to properly spread the individual values across the relative range. To determine the maxvalue in the macro, I use the QV evaluate() function to “callback” to the QV expression language.

maxValue = ActiveDocument.evaluate(
“max(aggr(sum(Sales), State))” )

Producing a chart with the Google API does have some downsides. The user must be connected to the internet and the chart will render slower than a native QV chart. It also does not provide for making selections in the chart and tooltip values like a QV chart does. But I found it to be a simple solution to my requirement. I hope that someday the QV product will provide regional maps as chart types.

Update October 3, 2008: Alistair on the QlikCommunity forum has posted an example of calling the Google Chart API without using macros, which I find to be the preferred method:

http://www.qlikcommunity.com/575/?tx_mmforum_pi1%5Baction%5D=list_post&tx_mmforum_pi1%5Btid%5D=3763&tx_mmforum_pi1%5Bfid%5D=9

The next update to the Qlikview Cookbook will include the “macro-less” technique.

Share

Memory sizes for data types

An earlier post of mine When less data means more RAM discussed the ways in which storage (“Symbol” space) needed for field values can increase depending on how a field is loaded or manipulated. This generated some followup questions on the QlikCommunity forum about the optimal storage sizes for fields of various data types.

What’s presented below is information gleaned from the documentation, QT Support and experimentation. The numbers come from the document memory statistics file. I hope someone from QT will help me correct any errors.

QV fields have both an internal and external representation. There is a video “Datatype Handling in Qlikview” available on QlikAcademy that explores this subject.This post is concerned with the internal storage of fields.

Numbers

I’ve found that the storage size appears to be related to the number of total digits. Storage size in bytes, for various digit ranges:

1-10 digits, size=4
11 or more digits, size=13

The above sizes assume that the internal storage format is numeric, which is usually the case if loading from a database. Numbers loaded as text such as from a text file or inline, may be stored as strings which will occupy different sizes.

Dates, Times and Timestamps

Different Database systems provide various degrees of precision in timestamps and I assume the ODBC driver is also involved with the exact value provided to QV during the load. QV times are the fractional part of a day, using up to 9 digits to the right of the decimal point.

– Best size for a Date, 4 bytes.
– Best size for a full Time, 13 bytes.
– Best size for a full Timestamp, 13 bytes.

These sizes can increase when the field is manipulated. Want to get the date portion of a timestamp? Don’t use

date(aTimestamp)

date() is a formatting function, it doesn’t “extract” the underlying date portion. In many cases, it actually increases storage size because the result may be a string. Instead, use

floor(aTimestamp)

this will produce a 4 byte integer result.

A common technique for reducing the memory footprint of timestamps is to separate the timestamp into two fields, integer date and fractional time. You can further reduce the number of unique time values by eliminating the hundredths of seconds, or even eliminating the seconds if your application is ok with minute precision.

Strings

Thanks to QT support for providing this detail on Strings.

“The representation is that each symbol has a pointer (4/8 bytes on 32/64-bit platform) + the actual symbol space. This space is the number of bytes (UTF-8 representation) + 2 (1 is a flag byte and 1 is a terminating 0) + 0, 4 or 8 bytes that store the numeric representation of the field.”

So on the 32bit version, a non-numeric string occupies 6 bytes more than the length of the string itself. A numeric string occupies 10 more bytes. For example:

“a” uses 7 bytes
“1” uses 11 bytes

The only way to reduce the string footprint is to reduce the number of unique values. This can be done by breaking the string into component parts if that makes sense in the application. For example, the first 3 characters of a 10 character product code may be a product class. Breaking the field into ProductClass and ProductNumber fields may reduce the number of unique values.

If the strings are keys that don’t need to be displayed, the autonumber() or autonumberhash128() functions can be used to transform the values to 4 byte integers. With these functions you can also get the “sequential integer optimization” which reduces the symbols space to zero.

I’ve found that concatenating fields in autonumber like
autonumber(f1 & f2)
can sometimes produce false duplicates. Better to instead use autonumberhash128 like
autonumberhash128(f1, f2)
This seems to always produce correct results.

Sequential Integer Optimization

For each field, QV maintains both a Symbol table — the unique values of a field — and a State array that tracks which values are selected. If the symbol values are consecutive integers, a very clever optimization takes place. The Symbol space is eliminated and the State array is used to represent both selection state and value. This is a very beneficial effect of using the autonumber functions.

The values need not begin at zero for the optimization to take place, they only need to be consecutive. A set of 5000 consecutive dates will occupy no Symbol space. Take one date out of the middle and the storage reverts to the standard 4 bytes for each date.

It’s not always necessary to be concerned about memory usage. But when it is, I hope this information proves useful.

Share

64bit Implementation Experience

When I started using Qlikview, I mistakenly believed I would not need the 64bit version of Server. I thought that because my Analyzer users were using the QV Windows Client, the memory required to hold the document would come from the user’s machine. Wrong. When a document is opened from the server, the document is loaded into server memory.

The 32bit Server uses a single 2GB address space to contain all the currently loaded documents. When the number of users increased, and more importantly, the number of concurrent documents, the Server ran out of memory. This unfortunately causes a Server crash, taking all the users down, not just the user that pushed it over the limit. It became clear we needed the 64bit edition.

Upgrading the Server (QVS) to 64bit was easy. It immediately solved the memory issue and allowed for many documents to be used simultaneously with no problem.

QV Publisher (QVP) turned out to be a different story. I initially installed Publisher on the same machine as Server but immediately ran into a problem with the availability of 64bit ODBC drivers.

Any ODBC Driver used in 64bit Windows must be written as 64bit capable. I was using four ODBC data sources – IBM DB2, MS SQLServer, Lotus Domino and SAS. 64Bit SQLServer drivers are supplied with the OS. DB2 64bit drivers are available, but they can be expensive. The sticking point was that there were no 64bit drivers available for Lotus Domino and SAS.

My first step was to move Publisher to a 32bit machine. This turns out to be a recommended practice anyway – host Server and Publisher on different machines. But I also had an application in development that would require 64bit for a full reload. How would I reload this application when it moved to production? I expected I would see more of these applications that required 64bit for reload.

Publisher provides for defining multiple Execution Services (XS) on different machines. XS is the service that performs the reload process. The multiple XS’s can be viewed and managed from a single Publisher Control panel screen. This feature allowed me to define an additional XS on a 64 bit server.

My configuration now consists of three servers. A 64bit QVS, one 32bit QVP and one 64bit QVP. The 32bit QVP is loaded with all the ODBC drivers I need, the 64bit QVP has no drivers installed. The restriction in this configuration is that reloads on the 64bit QVP may only load QVDs and other non-ODBC datasources. In some cases, this may require a script to be split into two or more documents. Thus far, this restriction has proven to be only a minor inconvenience. The two reloads can be connected together by utilizing a RequestEDX task to trigger the second reload task.

We chose not to migrate the developer workstations to 64bit due to the limited availability of ODBC drivers and other software. Most of the applications that require 64bit for reload can still be developed on a 32bit machine by loading a limited number of records. We did set up a single shared 64bit workstation that can be used by any developer when they require 64bit.

Migrating QVS to 64bit provides the capacity to support many concurrent documents and users. If you plan to use the 64bit QVP, check on 64bit driver availability as part of your planning process.

Share

When less data means more RAM

I attended Niklas Boman’s excellent Performance Tuning talk at Qonnections in Miami. One of his tuning recommendations was to reduce the number of rows and columns when possible. This will probably always have a positive impact on chart calculation time, but if done incorrectly, reducing the quantity of data can have an adverse impact on RAM usage.

Consider a QVD file with one million rows. The QVD was loaded from a database and contains two fields:

aNum – unique integers, 1M unique values.
aDate – dates distributed equally throughout 2000-2003, 1,460 unique values.


QV stores each of these values as integers, occupying 4 bytes of RAM each. Nice and compact.

Which of the following statements will create a QVW that uses more RAM? Statement A, which loads 1000K rows or Statement B, which loads only 750K rows?

Statement A:// Load all 1,000,000 rows
LOAD * FROM qvdData.qvd (qvd);


Statement B:
// LOAD only 2001+ which should be 750,000 rows
LOAD * FROM qvdData.qvd (qvd)
WHERE year(aDate) > 2000;

Pat yourself on the back if you answered “B”. B will use more RAM! More RAM for less data? Why? Because “B” causes an unoptimized load which results in QV converting the Integer representations of the data to String representation.

QV can load QVDs in one of two modes – Optimized or Unoptimized (more in the Ref Guide). In an optimized load, the RAM image from the QVD is loaded directly into memory. An optimized load is indicated in the Loading message in the progress window. (Note to development: would be nice if the optimized message appeared in the log as well).

In unoptimized mode, the QVD image is “unwrapped” and the data processed discretly. This causes the internal formatting to be lost and the data is stored internally as Mixed. So each “aNum” that previously occupied 4 bytes, now takes 9 bytes. “aDate” now averages 18.96 bytes each.

It’s the WHERE clause that forces the unoptimized load. Generally, adding fields or anything that causes a field value to be examined will force an unoptimized load. Examples of unoptimzed loads:

LOAD *, year(aDate) as Year FROM qvdData.qvd (qvd) ;
LOAD *, rowno() as rowid FROM qvdData.qvd (qvd)

Even a WHERE clause that does not reference any field will be unoptimized:
LOAD * FROM qvdData.qvd (qvd)
WHERE 1=1;

How can you tell how much RAM a field is using? “Document Settings, General, Memory Statistics” button will generate a .mem text file that contains a storage size for the “Symbols” (values) of each field. You can view the .mem file directly or load it into a QVW for processing. The 8.5 beta provides a “Qlikview Optimizer.qvw” for just this purpose. I’ve uploaded this file to the “Share Qlikviews” section of QlikCommunity if you don’t have 8.5.

WorkaroundsI’ve found that I can usually “fix” the field by setting the desired format in the Document Properties and checking the “Survive Reload” box. You can also apply formats in the load script, but I find this tedious if I have more than a few fields. Here are some alternative workarounds.

To create additional fields, use a RIGHT JOIN after the optimized load.
Instead of:
LOAD *, year(aDate) as Year FROM qvdData.qvd (qvd);

Use:
tdata:
LOAD * FROM qvdData.qvd (qvd);
RIGHT JOIN LOAD DISTINCT *, year(aDate) as Year
RESIDENT tdata;


For a subset selection, version 8 allows an optimized load using where exists() if the exists clause refers to only a single field. This means you’ll have to generate the desired values before the load using the same field name. Something like this:

//Generate table of the dates we want 2001-2004

LET vStartDate=num(MakeDate(2001,1,1)-1);
LET vEndDate=num(MakeDate(2004,12,31));
DateMaster:
LOAD date($(vStartDate) + IterNo()) as aDate
AUTOGENERATE 1
WHILE $(vStartDate) + IterNo() <= $(vEndDate);

// Optimized load of the subset dates
tdata:
LOAD * FROM qvdData.qvd (qvd) WHERE exists(aDate);
DROP TABLE DateMaster; // No longer needed

In some cases, the above example will give you an additional optimization. Something I call the “sequential integer optimization” which I’ll discuss on another day.

Worrying about RAM is not always necessary and many times is not worth the effort, especially if it makes your script harder to follow. However, for large datasets, particularly in the 32bit environment, you may be forced to optimize RAM usage. Using the mem files allows you to identify the most productive candidates for tuning.

The QV Reference Guide points out that an optimized load will run faster than an unoptimized load. I think it would be useful to have brief discussion of the impact on RAM usage as well.

Share