Today I offer up a discussion of Qlikview “compression”. That is, the Qlikview features that make overall data get smaller, and in some cases, larger.
Should you care? In most cases no. But understanding what “knobs you can turn” can be a useful tool for capacity planning and application tuning. Let’s look at the practices and parameters that affect data size.
Script Execution: Data read from sources – such as database tables – are read in to memory (RAM) by the script execution (reload) process. Duplicate values are reduced to the unique set of values for each column. A “Gender” column has only two values – “Female” and “Male”, so the storage required for this column is minimal compared to a column that has a wide range (cardinality) of values such as a timestamp. This is not really “compression” but rather what I call “de-duplication”.
The ratio of database storage to document storage is dependent on the data content as well as the use of common script techniques like separating timestamps into date and time fields. A typical database to document ratio is 10:1. For example, 2GB of database tables might require 200MB of document RAM.
QVW write to Disk: After reload, the Qlikview document (data tables and screen objects) is written from RAM to Disk as a *.qvw file. If compression is set on (default) for the document, the qvw will be compressed as it is written to disk. The compression results will vary depending on data content, but is typically in the range of 2-5 times. For example, a document that requires 200MB of RAM will require somewhere between 40MB and 100MB of Disk storage.
If compression is set to “None”, the document will be written to disk in the same format it existed in RAM and will occupy the same storage on disk as it utilized in RAM.
The Compression option for each Document is set in the Document Properties, General tab. The default compression for new documents is defined the User Settings, Save tab.
The compression option will of course impact the amount of disk storage used. But it also affects the amount of time it takes to read or write a qvw. I find that for most documents, an uncompressed document will write and read significantly faster than a compressed document. Some documents, especially large ones with high compression ratios, will read faster if compressed. The other factor is speed of the disk being used – local disk or network disk.
I typically do my development with compression off and then do a timing test with both options before migrating to the server.
QVW read from Disk: The *.qvw is loaded to RAM by a developer or on the Server by a user session. The amount of RAM required is the uncompressed size, regardless if compression was used to write the *.qvw to disk. As discussed in the previous section, my experience is that uncompressed documents read from a local disk typically load up faster, but this is not always true and is worth testing on large documents.
What is the compression factor for QVD files?
Zero.
A QVD file contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into Qlikview RAM, “ready to go”. Because QVD is the RAM image, there is no compression.
A QVD read with an optimized load will require the same RAM size as its size on disk (1:1). A QVD read with an un-optimized load may require significantly more RAM, due to some numeric fields being converted to strings. The expansion is typically about 2:1 but varies considerably.
Here is a summary of the various “compression points” and typical results.
Source
|
Destination
|
Ratio
|
Example
Result
|
Notes
|
Source DB
|
–
|
|
2GB
|
Raw Data
|
Source DB
|
Document RAM
|
10:1
|
200MB
|
Data de-duplication
|
Document RAM
|
QVW Disk
|
3:1
|
67MB
|
Save Compression=High
|
Document RAM
|
QVW Disk
|
1:1
|
200MB
|
Save Compression=None
|
QVW Disk
|
Document RAM
|
1:3
|
200MB
|
Save Compression=High
|
QVW Disk
|
Document RAM
|
1:1
|
200MB
|
Save Compression=None
|
Document RAM
|
QVD Disk
|
1:1
|
200MB
|
QVD always uncompressed
|
QVD Disk
|
Document RAM
|
1:1
|
200MB
|
Optimized load
|
QVD Disk
|
Document RAM
|
1:2
|
400MB
|
Non-Optimized load
|
If your documents are small and you are not experiencing performance issues, worry about none of this.
Compressed documents occupy less disk space and their smaller size makes them easier to manage for moving, backup, etc.
If you are trying to get a document to load faster, try turning off document compression and benchmark your results. Consider the type of disk when making this decision. Compression may more important in a network storage environment where reducing the amount of data transferred is a significant performance factor.
It’s important to understand that the document compression option has no impact on RAM usage. It only impacts the amount of data read and written to disk.