Document Compression

Today I offer up a discussion of Qlikview “compression”. That is, the Qlikview features that make overall data get smaller, and in some cases, larger.

Should you care? In most cases no. But understanding what “knobs you can turn” can be a useful tool for capacity planning and application tuning. Let’s look at the practices and parameters that affect data size.

 Script Execution:  Data read from sources – such as database tables – are read in to memory (RAM) by the script execution (reload) process. Duplicate values are reduced to the unique set of values for each column. A “Gender” column has only two values – “Female” and “Male”, so the storage required for this column is minimal compared to a column that has a wide range (cardinality) of values such as a timestamp.  This is not really “compression” but rather what I call “de-duplication”.

The ratio of database storage to document storage is dependent on the data content as well as the use of common script techniques like separating timestamps into date and time fields. A typical database to document ratio is 10:1. For example, 2GB of database tables might require 200MB of document RAM.

QVW write to Disk: After reload, the Qlikview document (data tables and screen objects) is written from RAM to Disk as a *.qvw file. If compression is set on (default) for the document, the qvw will be compressed as it is written to disk. The compression results will vary depending on data content, but is typically in the range of 2-5 times. For example, a document that requires 200MB of RAM will require somewhere between 40MB and 100MB of Disk storage.

If compression is set to “None”, the document will be written to disk in the same format it existed in RAM and will occupy the same storage on disk as it utilized in RAM.
The Compression option for each Document is set in the Document Properties, General tab. The default compression for new documents is defined the User Settings, Save tab.

The compression option will of course impact the amount of disk storage used. But it also affects the amount of time it takes to read or write a qvw. I find that for most documents, an uncompressed document will write and read significantly faster than a compressed document. Some documents, especially large ones with high compression ratios, will read faster if compressed. The other factor is speed of the disk being used – local disk or network disk.

I typically do my development with compression off and then do a timing test with both options before migrating to the server.

QVW read from Disk: The *.qvw is loaded to RAM by a developer or on the Server by a user session. The amount of RAM required is the uncompressed size, regardless if compression was used to write the *.qvw to disk.  As discussed in the previous section, my experience is that uncompressed documents read from a local disk typically load up faster, but this is not always true and is worth testing on large documents.

­What is the compression factor for QVD files?
 
Zero.

A QVD file contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into Qlikview RAM, “ready to go”. Because QVD is the RAM image, there is no compression.

A QVD read with an optimized load will require the same RAM size as its size on disk (1:1). A QVD read with an un-optimized load may require significantly more RAM, due to some numeric fields being converted to strings. The expansion is typically about 2:1 but varies considerably.

Here is a summary of the various “compression points” and typical results.
Source
Destination
Ratio
Example
Result
Notes
Source DB
2GB
Raw Data
Source DB
Document RAM
10:1
200MB
Data de-duplication
Document RAM
QVW Disk
3:1
67MB
Save Compression=High
Document RAM
QVW Disk
1:1
200MB
Save Compression=None
QVW Disk
Document RAM
1:3
200MB
Save Compression=High
QVW Disk
Document RAM
1:1
200MB
Save Compression=None
Document RAM
QVD Disk
1:1
200MB
QVD always uncompressed
QVD Disk
Document RAM
1:1
200MB
Optimized load
QVD Disk
Document RAM
1:2
400MB
Non-Optimized load

If your documents are small and you are not experiencing performance issues, worry about none of this.

Compressed documents occupy less disk space and their smaller size makes them easier to manage for moving, backup, etc.

If you are trying to get a document to load faster, try turning off document compression and benchmark your results. Consider the type of disk when making this decision. Compression may more important in a network storage environment where reducing the amount of data transferred is a significant performance factor.

It’s important to understand that the document compression option has no impact on RAM usage. It only impacts the amount of data read and written to disk.
Share

8 thoughts on “Document Compression”

  1. What’s the difference in optimized load and non-optimized load for QVD files? How do we achieve this?
    I Use QVD files for my dashboard but i have no idea if they are optimized or not??

  2. Jochem,

    An optimized load is achieved by loading the QVD straight into memory, without performing any transformations (for example a where-clause). The advantage is that the loading time can be much shorter than with an unoptimized load. For small volumes you probably will not notice any difference.

    You can recognize the optimized load by the “(qvd optimized)” line that appears behind the name of your input QVD’s in the load log.

    Kind regards,

    Barry

  3. are you sure please?
    Because in my case,
    the size of all QVDs =2.5 Go
    So data source=2.5*10= 25 Go

    If i calculates the QWsize (according to this formula QVWsizedisk = SourceData × (1 – CompressionRatio)), we get 25*(1-0.9)=2.57 Go
    But in reality, the size of my QVW is 146 Mo!!

  4. @sana

    The numbers given in the post were just examples using some typical numbers. The actual factors will vary significantly depending on the composition of your data. The intent of the post was to help understand the compression options and methods, not to provide specific universal factors.

Comments are closed.