Reducing Qlik Field Memory

Summary: A look at techniques to reduce the memory footprint of your Qlik Data Models.

There are good reasons to reduce the memory footprint of your Qlik Apps. Smaller Apps tend to perform better and require fewer server resources. Qlik Cloud imposes size quotas and there is a financial cost to exceeding the quota.

In this post we’ll look at techniques to reduce the memory requirements of your data model.

First, lets review how Qlik Sense stores App data in memory.

The Qlik Load script transforms your data into two types of internal data structures.

A Symbol Table is created for each field. The Symbol table contains a list of the distinct values of this field. The overall size of the symbol table is therefore a function of how many values and the width of the values.

Logical Tables are the tables we see created in the data model, i.e. fact and dimension tables. The Logical tables contain one row for each row of data. The tables do not contain the field values directly. Rather they contain a pointer to a value in the associated Symbol table. This pointer is sometime called an Index.

The width of a pointer is the number of bits required to index the number of values in the Symbol table. For example, in the image above we have 5 values for “Year”. This would require 3 bits as 2^3 is 8, which would cover the required 5 values. 2^2 would be too small as that would only cover 4 values.

My QSDA Pro tool shows the memory required for each field, including a breakdown by Symbol and Index.

There are other tools to view field sizes such as Qlik Cloud App Performance Analyzer, Qlik Cloud App Analyzer and App Metadata Analyzer for Qlik client-managed. In my experience all of these report only on symbol space per field, not index space.

I’m going to use QSDA Pro for my demonstrations in this article.

Using less memory for fields involves reducing the size of the symbol table, the index, or both. There are four techniques we can employ to reduce field memory:

Eliminate the field
Reduce the number of field values
Optimize field values to reduce width, including AutoNumber
Reduce the number of rows

Let’s look at each of these options.

Eliminate the field

Removing the field from the data model is the easiest way to reduce memory :). Of course this is only viable if the field is not used by your application. To determine if a field is used, you must consider all the places a field may be referenced — master items, variables, charts, bookmarks, etc — for both public and private content. Fortunately, QSDA scans all public and private content and makes the inuse determination for you.

In some cases you will not want to drop a field as you anticipate it being used in the future. The benefit from removing fields is relative to the size of the field. When considering whether you want to drop a field, look to the largest fields first where you will gain the most benefit.

QSDA Drop Unused Fields Dialog

Reduce the number of field values

Reducing the number of distinct values will reduce both symbol and index space. How might we do this?

A common scenario involves timestamps. For example let’s look at a field named “ShipTime” which displays in our app as hh:mm:ss, reporting the time an order was shipped. The analysis requirement is that occasionally an analyst will need to see this value in an order detail chart. This field has 459,918 distinct values. Something seems off. There are only 86,400 possible hh:mm:ss values per day and we only ship during one shift so I would expect to see no more than 28,800 values.

When I examine the values in the app I discover the field contains an entire Datetime value and the display format hh:mm:ss is being used in the chart. I don’t need the date portion. I’ll use the Frac() function in the script to extract just the time.

Reload followed by a new QSDA Analysis and here’s the result below. A massive reduction in distinct values and associated savings in both symbol and index. This reduces the total memory for this field from 5200KB to 345KB.

Another potential scenario for this field is that the original developer required both the date and time. In this case our approach is to split the datetime into separate Date and Time fields like this. Remember that formatting functions like Date() and Time() do not change values, we must use numeric functions like Floor() and Frac().

If I need to display the “Ship timestamp” in a chart, I’ll do it like this in the chart:

ShipDate & ' ' & ShipTime

A QSDA analysis now shows a total of 863KB for both fields. A big reduction from 5200KB!

Other potential ways to reduce the number of values:

Drop seconds or fractions from Time fields.

Some accounting systems return fractional amounts, or you may have unneeded fractions due to a calculation. Round the final amount to the penny or appropriate magnitude.

Optimize field values to reduce width

Reducing the width of the field values decreases the Symbol space but not the Index space. How can we reduce the width of a value without changing the business meaning of the value? Isn’t a value a value?

We’ll start with the easy way — AutoNumber. The AutoNumber statement (and AutoNumber function) translates values into a set of sequential integers. This is useful for compacting key values. We typically don’t need to see the key values, we just need to them to link our tables together.

AutoNumbering a field is done by adding this statement at the end of your script:

Here’s the before and after sizes for SalesOrderDetailID, showing a total reduction of 143MB.

Note that the symbol space has been completely eliminated! When the symbol set consists of sequential integers, Qlik does not store the symbols. Instead, the index value is used as a proxy for the symbol value.

Now we need to dive a bit deeper into the actual format of a symbol table. The symbol table diagram at the top of this post is a bit of a simplification.

Generally speaking, strictly numeric values can be stored in 8 byte cells. For a field that contains only numeric values the symbol table is an array of 8 byte values.

We can index to the nth value simply by multiplying by 8. This is a very efficient and compact storage format. In QSDA these fields will show an Average Symbol Width of 8.00.

For fields that contain strings, the symbol table layout is a bit different. As strings can be of varying length, an array of fixed cells won’t do. If there are any strings in the field, the entire field is considered “mixed”.

The symbol table for a mixed field is an array of 4 byte pointers that point to symbol values elsewhere in memory. The values consists of:

1 byte flag
The varying length string
1 byte null terminator

The total symbol space for a string is 6 bytes plus the length of the string. The storage for string “a” would be 7 bytes.

Looking at the “LastName” field in the image above, we know that each symbols value carries a 6 byte overhead. We can infer that the average length of the LastName strings is 6.54. That is, 12.54 – 6.

When the value is a Dual() value, the symbol value picks up an extra 4 or 8 bytes to hold the numeric value. 4 bytes is used for integers, 8 for decimal values.

The total symbol space for a dual value is 6 bytes plus the length of the string, plus 4 or 8 bytes.

A field may contain both string and dual types. Dual is a value attribute, not a field attribute. For more on that topic see “Dual Storage vs Dual Behavior“.

Ok, we’ve just gone down a deep rabbit hole. Maybe interesting, but is this important to reducing the overall memory footprint of your app? Generally no, sometimes yes. Let’s look at some examples.

Sometimes fields that are strictly numeric get inadvertently loaded as Dual() values. Certain operations including where clauses in loads can cause this behavior. Frankly, I don’t have comprehensive understanding of all the ways this can happen. But when it does happen, we can see numeric fields having a symbol size of more than 8.

The typical way to return these fields to 8 bytes (if you can’t fix the root cause of the issue) is to wrap the field with Num() or +0 when loading.

Num(UnitPrice) as UnitPrice UnitPrice+0 as UnitPrice

I prefer neither approach. I typically use a TempFormat table instead.

Is it worth the trouble to fix these? At first glance it may look like a big deal, but the memory savings is typically small.

The UnitCost field occupies 40MB and the symbol width should be 8 instead of 13.45. Almost all of the storage is in the Index Bytes. The field has a relatively small amount of values. Making those 2,011 values 8 bytes each would save me a total of 10967 Symbol Bytes. (27055 – (2011 * 8))

All that said there are occasions when you have a larger number of distinct values and this can make a significant difference. I almost always “fix” these fields without giving it too much thought as it’s easy to do with a tool like QSDA.

Reduce the number of rows

Reducing the number of rows that reference a field decreases the Index space, but not the Symbol space.

Wait! Are you suggesting I load less data? No, I’m talking about data model changes that will not change the business value.

A typical opportunity to reduces rows is when you have very sparse data on a large fact table. Consider an OrderDetail table that includes return information on each Order Line. Only 0.5% of Order Lines get returned, so we end up with a lot of Index space pointing to nothing. This is indicated by a low Information Density for the table field.

These four Return fields in the SalesOrderDetail table require 23.28MB.

By moving these fields to a new “Returns” table, linked by SalesOrderDetailID, the Return fields and the linking key now require 227KB — a savings of 23MB, a relatively significant amount.

I hope you found some useful tips in this post. If you want to learn more about how QSDA Pro can help you optimize your Qlik Apps, join me on Nov 19 for the Click Less, Qlik More – QSDA Pr o webinar or reach out to us.

-Rob

4 thoughts on “Reducing Qlik Field Memory”

A.CHAOUCH says:

Nov 15, 2024 at 9:18 am

Rob, another great article, thanks ! below some comments and questions :

According to qlik online help : num() is dual

In Qlik cloud, does “drop fields” after a binary still not free memory ? : https://qlikviewcookbook.com/2013/10/drop-field-does-not-release-all-space/

Other common solutions to reduce overall memory footprint of an app : normalization , Dynamic views , ODAG …

An old and robust (HIC) article as a complement: https://community.qlik.com/t5/Design/Symbol-Tables-and-Bit-Stuffed-Pointers/bc-p/1475410

It would be nice to have synthetic posts on how to reduce memory of
Reloads (incremental load, partial reload …)
Hypercubes (calculated dim,conditionnal kpis, Aggr, flags..)

Other :
Adding the average reading time on the top of article would be appreciated
2^3 = 8

Rob Wunderlich says:

Nov 15, 2024 at 3:11 pm

“According to qlik online help : num() is dual”

In my experience if all values of the field are created within script using the same num() format mask, then you get what I called an “implied dual” in my post on dual behavior. If you mix format masks for the field in the same script, you get explicit duals that are stored as true duals.

I don’t know if the drop fields after binary behavior has changed. I’ll have to think about how to test that in Qlik Cloud.

Adrian says:

Nov 16, 2024 at 10:33 am

Rob,
Firstly, a great article. Thanks for sharing.

When I started as a consultant in the world of ERP our constraints were hard disk space, combined with RAM and compute power that a Raspberry PI today could dance over.

What this meant was that we need to keep our data tidy at source. And we used indexes for any reporting.

This may already be covered above; I see a lot of field name with prefixes that can be removed/replaced, or even mapped to mapping tables.

Does saving bytes in field names actually save much space?

Even if not for analytics, does it anymore for actual storage? We all need to be environmentally conscious and reducing storage overhead has to considered best practice, I would have thought.

My last question, is about Nulls. I hear a lot that Nulls are handled differently in QlikView to Qlik Sense and Qlik Cloud. ie become distinct indexable items and thus explode memory consumption. Is this a myth or true, and what is the easiest way to deal with a View to Sense Null migration?

Thanks

Rob Wunderlich says:

Nov 16, 2024 at 3:19 pm

“Does saving bytes in field names actually save much space?”

I don’t think using shorter field names would save any meaningful storage. I believe the name is only stored once.

I’m not aware of any differences in Null handling between the various editions of Qlik Analytics. I’d be curious if you have any references.

Qlikview Cookbook

Eliminate the field

Reduce the number of field values

Optimize field values to reduce width

Reduce the number of rows

4 thoughts on “Reducing Qlik Field Memory”

Leave a Reply to Rob Wunderlich Cancel reply

Recipes for Qlikview Success