Category Archives: General

Document Compression

Today I offer up a discussion of Qlikview “compression”. That is, the Qlikview features that make overall data get smaller, and in some cases, larger.

Should you care? In most cases no. But understanding what “knobs you can turn” can be a useful tool for capacity planning and application tuning. Let’s look at the practices and parameters that affect data size.

 Script Execution:  Data read from sources – such as database tables – are read in to memory (RAM) by the script execution (reload) process. Duplicate values are reduced to the unique set of values for each column. A “Gender” column has only two values – “Female” and “Male”, so the storage required for this column is minimal compared to a column that has a wide range (cardinality) of values such as a timestamp.  This is not really “compression” but rather what I call “de-duplication”.

The ratio of database storage to document storage is dependent on the data content as well as the use of common script techniques like separating timestamps into date and time fields. A typical database to document ratio is 10:1. For example, 2GB of database tables might require 200MB of document RAM.

QVW write to Disk: After reload, the Qlikview document (data tables and screen objects) is written from RAM to Disk as a *.qvw file. If compression is set on (default) for the document, the qvw will be compressed as it is written to disk. The compression results will vary depending on data content, but is typically in the range of 2-5 times. For example, a document that requires 200MB of RAM will require somewhere between 40MB and 100MB of Disk storage.

If compression is set to “None”, the document will be written to disk in the same format it existed in RAM and will occupy the same storage on disk as it utilized in RAM.
The Compression option for each Document is set in the Document Properties, General tab. The default compression for new documents is defined the User Settings, Save tab.

The compression option will of course impact the amount of disk storage used. But it also affects the amount of time it takes to read or write a qvw. I find that for most documents, an uncompressed document will write and read significantly faster than a compressed document. Some documents, especially large ones with high compression ratios, will read faster if compressed. The other factor is speed of the disk being used – local disk or network disk.

I typically do my development with compression off and then do a timing test with both options before migrating to the server.

QVW read from Disk: The *.qvw is loaded to RAM by a developer or on the Server by a user session. The amount of RAM required is the uncompressed size, regardless if compression was used to write the *.qvw to disk.  As discussed in the previous section, my experience is that uncompressed documents read from a local disk typically load up faster, but this is not always true and is worth testing on large documents.

­What is the compression factor for QVD files?
 
Zero.

A QVD file contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into Qlikview RAM, “ready to go”. Because QVD is the RAM image, there is no compression.

A QVD read with an optimized load will require the same RAM size as its size on disk (1:1). A QVD read with an un-optimized load may require significantly more RAM, due to some numeric fields being converted to strings. The expansion is typically about 2:1 but varies considerably.

Here is a summary of the various “compression points” and typical results.
Source
Destination
Ratio
Example
Result
Notes
Source DB
2GB
Raw Data
Source DB
Document RAM
10:1
200MB
Data de-duplication
Document RAM
QVW Disk
3:1
67MB
Save Compression=High
Document RAM
QVW Disk
1:1
200MB
Save Compression=None
QVW Disk
Document RAM
1:3
200MB
Save Compression=High
QVW Disk
Document RAM
1:1
200MB
Save Compression=None
Document RAM
QVD Disk
1:1
200MB
QVD always uncompressed
QVD Disk
Document RAM
1:1
200MB
Optimized load
QVD Disk
Document RAM
1:2
400MB
Non-Optimized load

If your documents are small and you are not experiencing performance issues, worry about none of this.

Compressed documents occupy less disk space and their smaller size makes them easier to manage for moving, backup, etc.

If you are trying to get a document to load faster, try turning off document compression and benchmark your results. Consider the type of disk when making this decision. Compression may more important in a network storage environment where reducing the amount of data transferred is a significant performance factor.

It’s important to understand that the document compression option has no impact on RAM usage. It only impacts the amount of data read and written to disk.
Share

An Example is Worth a Thousand Thread Replies

There’s a lot of information being exchanged on the QlikCommunity Forum http://www.QlikCommunity.com these days. Customers and Consultants ask technical questions, and other Customers, Consultants and QT employees provide very useful answers. Today’s post is a tip on how to improve the chances of your Forum question being answered quickly and accurately.

Many back and forth replies to a forum thread are about clarifying the question. If possible, post a qvw file example with your question. (I can’t, my file is too big! The data is private! Keep reading for ways to handle these concerns).

Reasons to post an example qvw:

  • An example will help clarify your question. The Forum is conducted in English, but English is a second language for many, if not most, of the Forum users. An example will provide additional understanding of your question.
  • More likely to get an accurate and complete response. Many questions require the responders to fiddle with expression or script syntax. If I have a qvw to work with, I’m more likely to test my answer before posting it, saving you the trouble of learning that I forgot a comma in my recommended solution.
  • Time. Most Forum members answer questions on a volunteer basis and their time is limited. For myself, I can only take the time to answer a limited number of questions. I’m more likely to pick the questions that are clear and provide the data I need. If I have to code up my own test data to work on the problem, I’m less likely to respond.

Some of the reasons you may be reluctant to post your qvw — size and privacy.

The maximum attachment size allowed on the Forum is 1MB. You can make the example qvw smaller by using the QV Data Reduction feature.

  1. Make some selections to reduce the number of selected values in the qvw.
  2. From the menu bar, select File->Reduce Data ->Keep Possible Values.
  3. Use File->Save As to save the reduced copy under a new name.

    If you use “Save”, QV will still open the “Save As” to help you remember not to overwrite the master copy.

You can protect the privacy of sensitive information, such as account numbers, revenue or customer names by using the QV Scrambling feature. In the menu bar, select Settings->Document Properties->Scrambling.

Here you can select a field to scramble and press the “Scramble” button to perform a random scrambling of the field . No one can determine it’s original contents. Like values will scramble to the same value which maintains the value linkages.

In some cases, you may still be unable to post your qvw even with reduction and scrambling. Or it may make your example more clear to post the data inline with your question. In that case, post your example data in the question using comma delimited format, so it can easily be pasted to a LOAD INLINE. For example:

Accounts:
AccountNo, Name
1234, ABC Corp
4567, DEF Co

Transactions:
AccountNo, TranId, Amount
4567, 1, 2000

One last tip. Before you post, remember to search for existing answers to your question. In the past, search on the Forum was not so robust. But QT has recently added an embedded Google search feature. This is great! It supports the full range of Google search operators. Try it. The “Google Custom Search” link is available at the top of each Forum page.

Finally, don’t forget to mark your question as “solved” when you’ve received a satisfactory answer.

Happy posting!

-Rob

Share