Category Archives: Uncategorized

Reducing Qlik Field Memory

Summary: A look at techniques to reduce the memory footprint of your Qlik Data Models.

There are good reasons to reduce the memory footprint of your Qlik Apps. Smaller Apps tend to perform better and require fewer server resources. Qlik Cloud imposes size quotas and there is a financial cost to exceeding the quota.

In this post we’ll look at techniques to reduce the memory requirements of your data model.

First, lets review how Qlik Sense stores App data in memory.

The Qlik Load script transforms your data into two types of internal data structures.

A Symbol Table is created for each field. The Symbol table contains a list of the distinct values of this field. The overall size of the symbol table is therefore a function of how many values and the width of the values.

Logical Tables are the tables we see created in the data model, i.e. fact and dimension tables. The Logical tables contain one row for each row of data. The tables do not contain the field values directly. Rather they contain a pointer to a value in the associated Symbol table. This pointer is sometime called an Index.

The width of a pointer is the number of bits required to index the number of values in the Symbol table. For example, in the image above we have 5 values for “Year”. This would require 3 bits as 2^3 is 8, which would cover the required 5 values. 2^2 would be too small as that would only cover 4 values.

My QSDA Pro tool shows the memory required for each field, including a breakdown by Symbol and Index.

There are other tools to view field sizes such as Qlik Cloud App Performance Analyzer, Qlik Cloud App Analyzer and App Metadata Analyzer for Qlik client-managed. In my experience all of these report only on symbol space per field, not index space.

I’m going to use QSDA Pro for my demonstrations in this article.

Using less memory for fields involves reducing the size of the symbol table, the index, or both. There are four techniques we can employ to reduce field memory:

  • Eliminate the field
  • Reduce the number of field values
  • Optimize field values to reduce width, including AutoNumber
  • Reduce the number of rows

Let’s look at each of these options.

Eliminate the field

Removing the field from the data model is the easiest way to reduce memory :). Of course this is only viable if the field is not used by your application. To determine if a field is used, you must consider all the places a field may be referenced — master items, variables, charts, bookmarks, etc — for both public and private content. Fortunately, QSDA scans all public and private content and makes the inuse determination for you.

In some cases you will not want to drop a field as you anticipate it being used in the future. The benefit from removing fields is relative to the size of the field. When considering whether you want to drop a field, look to the largest fields first where you will gain the most benefit.

QSDA Drop Unused Fields Dialog

Reduce the number of field values

Reducing the number of distinct values will reduce both symbol and index space. How might we do this?

A common scenario involves timestamps. For example let’s look at a field named “ShipTime” which displays in our app as hh:mm:ss, reporting the time an order was shipped. The analysis requirement is that occasionally an analyst will need to see this value in an order detail chart. This field has 459,918 distinct values. Something seems off. There are only 86,400 possible hh:mm:ss values per day and we only ship during one shift so I would expect to see no more than 28,800 values.

When I examine the values in the app I discover the field contains an entire Datetime value and the display format hh:mm:ss is being used in the chart. I don’t need the date portion. I’ll use the Frac() function in the script to extract just the time.

Reload followed by a new QSDA Analysis and here’s the result below. A massive reduction in distinct values and associated savings in both symbol and index. This reduces the total memory for this field from 5200KB to 345KB.

Another potential scenario for this field is that the original developer required both the date and time. In this case our approach is to split the datetime into separate Date and Time fields like this. Remember that formatting functions like Date() and Time() do not change values, we must use numeric functions like Floor() and Frac().

If I need to display the “Ship timestamp” in a chart, I’ll do it like this in the chart:

ShipDate & ' ' & ShipTime

A QSDA analysis now shows a total of 863KB for both fields. A big reduction from 5200KB!

Other potential ways to reduce the number of values:

  • Drop seconds or fractions from Time fields.
  • Some accounting systems return fractional amounts, or you may have unneeded fractions due to a calculation. Round the final amount to the penny or appropriate magnitude.

Optimize field values to reduce width

Reducing the width of the field values decreases the Symbol space but not the Index space. How can we reduce the width of a value without changing the business meaning of the value? Isn’t a value a value?

We’ll start with the easy way — AutoNumber. The AutoNumber statement (and AutoNumber function) translates values into a set of sequential integers. This is useful for compacting key values. We typically don’t need to see the key values, we just need to them to link our tables together.

AutoNumbering a field is done by adding this statement at the end of your script:

Here’s the before and after sizes for SalesOrderDetailID, showing a total reduction of 143MB.

Before AutoNumber
After AutoNumber

Note that the symbol space has been completely eliminated! When the symbol set consists of sequential integers, Qlik does not store the symbols. Instead, the index value is used as a proxy for the symbol value.

Now we need to dive a bit deeper into the actual format of a symbol table. The symbol table diagram at the top of this post is a bit of a simplification.

Generally speaking, strictly numeric values can be stored in 8 byte cells. For a field that contains only numeric values the symbol table is an array of 8 byte values.

We can index to the nth value simply by multiplying by 8. This is a very efficient and compact storage format. In QSDA these fields will show an Average Symbol Width of 8.00.

QSDA Fields

For fields that contain strings, the symbol table layout is a bit different. As strings can be of varying length, an array of fixed cells won’t do. If there are any strings in the field, the entire field is considered “mixed”.

The symbol table for a mixed field is an array of 4 byte pointers that point to symbol values elsewhere in memory. The values consists of:

  • 1 byte flag
  • The varying length string
  • 1 byte null terminator

The total symbol space for a string is 6 bytes plus the length of the string. The storage for string “a” would be 7 bytes.

Looking at the “LastName” field in the image above, we know that each symbols value carries a 6 byte overhead. We can infer that the average length of the LastName strings is 6.54. That is, 12.54 – 6.

When the value is a Dual() value, the symbol value picks up an extra 4 or 8 bytes to hold the numeric value. 4 bytes is used for integers, 8 for decimal values.

The total symbol space for a dual value is 6 bytes plus the length of the string, plus 4 or 8 bytes.

A field may contain both string and dual types. Dual is a value attribute, not a field attribute. For more on that topic see “Dual Storage vs Dual Behavior“.

Ok, we’ve just gone down a deep rabbit hole. Maybe interesting, but is this important to reducing the overall memory footprint of your app? Generally no, sometimes yes. Let’s look at some examples.

Sometimes fields that are strictly numeric get inadvertently loaded as Dual() values. Certain operations including where clauses in loads can cause this behavior. Frankly, I don’t have comprehensive understanding of all the ways this can happen. But when it does happen, we can see numeric fields having a symbol size of more than 8.

The typical way to return these fields to 8 bytes (if you can’t fix the root cause of the issue) is to wrap the field with Num() or +0 when loading.

Num(UnitPrice) as UnitPrice
UnitPrice+0 as UnitPrice

I prefer neither approach. I typically use a TempFormat table instead.

Is it worth the trouble to fix these? At first glance it may look like a big deal, but the memory savings is typically small.

The UnitCost field occupies 40MB and the symbol width should be 8 instead of 13.45. Almost all of the storage is in the Index Bytes. The field has a relatively small amount of values. Making those 2,011 values 8 bytes each would save me a total of 10967 Symbol Bytes. (27055 – (2011 * 8))

All that said there are occasions when you have a larger number of distinct values and this can make a significant difference. I almost always “fix” these fields without giving it too much thought as it’s easy to do with a tool like QSDA.

Reduce the number of rows

Reducing the number of rows that reference a field decreases the Index space, but not the Symbol space.

Wait! Are you suggesting I load less data? No, I’m talking about data model changes that will not change the business value.

A typical opportunity to reduces rows is when you have very sparse data on a large fact table. Consider an OrderDetail table that includes return information on each Order Line. Only 0.5% of Order Lines get returned, so we end up with a lot of Index space pointing to nothing. This is indicated by a low Information Density for the table field.

QSDA Table Fields

These four Return fields in the SalesOrderDetail table require 23.28MB.

By moving these fields to a new “Returns” table, linked by SalesOrderDetailID, the Return fields and the linking key now require 227KB — a savings of 23MB, a relatively significant amount.

I hope you found some useful tips in this post. If you want to learn more about how QSDA Pro can help you optimize your Qlik Apps, join me on Nov 19 for the Click Less, Qlik More – QSDA Pro webinar or reach out to us.

-Rob

Share

Outer Sets Bring New Reusability

Summary: I demonstrate how outer sets in Qlik expressions can be used to extend the reusability of Master Measures.

The new outer sets feature in Qlik brings new opportunities for Master Measure reuse in Qlik Sense and QlikView. See this post by Henric Cronström for an explanation of the new feature.

Without outer sets if we wanted to inject set analysis into a measure, we had to copy the expression text and edit. Consequently we are no longer linked to the master item and have lost the maintenance and quality benefits of being linked.

Let’s look at an example where outer sets can improve reusability. In our sample scenario, the business rule for “Sales” is sales receipts minus returns and excluding employee purchases. Our Sales master measure might look like this:

Note that we had to repeat the same set inside each aggregation. We can simplify the expression by using outer set syntax and defining the set once.

This does nothing for reusability, but I believe it improves maintainability. The outer parentheses are not strictly necessary in this case, but they do make it clearer that the set applies to the entire expression.

Continuing on we define a COGS measure:

Note that we could have used the outer set syntax, but I don’t see any advantage to that here.

With the ability to use Measure names in expressions, available in Feb 21 release, let’s define a Margin measure.

Now we can drag the Sales and Margin measure into a KPI.

Now we want to do Sales and Margin for the current year only. Our data requires that we inject a new set. This is where outer sets really shine. The Current Year Sales measure using an outer set:

The Current Year Margin measure:

Note that we have maintained our linkage to the Sales and Margin master measures!

In review, here are all five Measures taking advantage of the outer sets syntax.

I’m happy to see outer sets available in the product. I look forward to the day when sets may be master items!

-Rob

Share

QViewer Version 4 is Here

QViewer, the ubiquitous QVD file viewer,  has received a major update. In addition to the fast and beautiful QVD viewing and filtering you are used to, V4 introduces “Folder View”. Folder View allows you to work with folders and subfolders of QVDs, viewing details and opening QVDs from the list.

Here’s what the new Folder View looks like:

 

You have a rich set of options for filtering, sorting and launching actions like viewing or compare.  Want to identify the differences between two QVDs?  Take a look at the  compare function:

This short video gives an overview of some of the Folder View features and capabilities.

Download QViewer V4 here.

And of course you still have fast viewing and filtering within a QVD in the File View:

 

QViewer is free to use with limits — 100k rows per QVD and 50 QVDs per Folder.  Purchase a low cost license to remove the limits and support my work.

Want to explore all the features of QViewer? Visit the help site.

If you have any feedback or enhancement suggestions., visit the EasyQlik Community site.

Happy Viewing!

-Rob

Share

Install QS Document Analyzer on Server

Some people wish to run QS Document Analyzer (QSDA) on the Qlik Sense Enterprise server rather than Qlik Sense Desktop.  While the current installer only supports desktop, you can manually install to the server.

QSDA installs three components; a connector, an application (qvf) and an extension.

Install the latest QSDA on a desktop machine to unpack the components.  You don’t even need QS Desktop installed.  Take note of the folders where the components were installed.

Using the QMC,  upload the qvf application to the server.  Zip up the qsda-ribbon extension folder and upload as an extension through the QMC.

Copy the QsAppMetadataConnector folder from the install folder to this folder on the server:

C:\Program Files\Common Files\Qlik\Custom Data

If you are running a multi-node cluster, the connector must be installed on the central and rim nodes.

That’s it!

 

Share

Rob Joins Websy.io

I’m pleased to announce that I’m joining forces with the team at websy.io!  I look forward to bringing the websy training  offerings to locations in the US as an Instructor and event organizer.

I’m also excited about the opportunities to bring websys’s recognized expertise to Qlik Sense Integration and Visualization projects throughout the world.

If you would like to inquire about scheduling a training at your site or would like to engage the experts at websys.io in your project, please contact us.

-Rob

 

Share

QlikView to Qlik Sense Convertor

Are you migrating QlikView Apps to Qlik Sense?  Have you tried the new QlikView Convertor tool in QS 3.2? 

The QV Convertor tool is available from the Dev Hub in QS 3.2+.  It’s a pretty slick tool that converts QV Variables, Dimensions and Expressions to Master items and converts QV charts to Master QS Visualizations.  It does not attempt to replicate the sheets,  but expects you to place the visualizations on sheets yourself as required.

It’s a very useful tool with a good UI that allows for filtering and limiting what gets converted.

At the Atlanta Qlik Dev Group meeting on July 13 I’ll be demonstrating the tool and presenting some tips and considerations for doing conversions.   They’ve given me two hours (!) to speak so I’ll be covering several other topics as well.

-Rob

Share

Evaluating a Data Story

I’m midway through Alberto Cairo’s new book “The Truthful Art” and finding it very stimulating.  It’s an interesting time to be a data scientist,  journalist or consumer of data.

“The Truthful Art” encourages us to use data truthfully and fearlessly, and provides processes and principles to do so.

This week I noted a new study published by the Center for Immigration Studies (CIS). A recent Presidential Executive Order asserts that the US is in special danger from travelers from seven particular countries. The order is controversial and is currently being challenged in the courts.

The CIS study found that 72 individuals from the “seven terror-associated countries”  were convicted in terror cases since 9/11/2001.  The study offers this number as evidence of the exceptional danger posed by immigration from the seven countries.

It seems like there may be more of story here than “72 terrorists from seven countries”. The study provided a link to the raw data used. I undertook an evaluation of the data and conclusions using some of the techniques I had just been reading about.

The date used to select cases in the study was “Conviction Date”.   A more meaningful date would be “Offense Date” Offense Date was not given,  but a “Charge Date” was available. I saw this  as a better proxy for when the offense occurred.  As shown in the table below, the number of days between Charge and Conviction can be quite substantial.  Using Conviction Date skews the offense into the wrong time period.

Days from Charge to Conviction

Median 75% Maximum
381 840 2407

 

Now instead of looking at “72”, I “broadened” my view of the data as Cairo would suggest.  What about the other countries?  Are there slices of the data that provide insight?

When I plotted two country groups — banned and others — over time, an interesting story emerged.  There are no defendants from the banned countries in the last three years of the study. This suggests that travelers from those countries may actually pose less risk than travelers from other countries.

After 9/11, US domestic counter-terrorism efforts were greatly expanded and overhauled.   The decline shown in the chart suggests to me that the current screening procedures are effective and continually improving.

I’m going to continue my journey through “The Truthful Art“.

-Rob

Share

Guest Speakers for Munich Masters Summit

We’ve got a couple of special  guest presenters lined up for the Munich Masters Summit for Qlik , 5-7 April.

Ralf Becher of TIQ Solutions GmbH, Qlik Luminary and  well known in the Qlik Community, will give a talk titled Spice your Qlik Sense app with Extensions and Widgets.  Ralf will present  use cases for extensions and showcase some of the incredible extensions he has created.

Nick Webster of Websy LTD will present  The Search for Sensey McSenseface.  Imagine building a dashboard simply by talking. Nick will demonstrate a natural language interface combined with Qlik APIs that allow a user to generate visualizations on the fly by asking questions like “show me sales for Germany”.  It’s pretty intriguing stuff.

Those special speakers are in addition to our already packed three day agenda.

I hope to see you in Munich. Can’t make Munich? Maybe you will join us  September 2017 in Boston.

Learn more about the Summit or register for Munich.

 

-Rob

Share

Document Open Processing

Summary: QlikView OnOpen Document triggers fire after the saved opening sheet is calculated.  That sheet may have some heavy calculations that slow the user’s opening experience.

Maybe you already know this, maybe not.

I commonly use a Document OnOpen trigger to make sure that my Document opens on the correct sheet.  I do this because I can be lazy or sloppy during development and save the document with the wrong sheet open.  What ever I saved with becomes the opening sheet.  I hate to reinstall an app just because I saved on the wrong sheet.

Are there any “gotchas” with doing things this way?

I just noticed (after all these years) that even if my OnOpen trigger opens a trivial or empty sheet, my document can take a long time to open.  Why? Because the data requires a lot of decompression? No.

It’s because objects on the saved sheet are getting calculated and then the OnOpen Activate Sheet is being applied!  If my saved sheet has some heavy objects, I may wait a while for those calculations to complete before the Activate Sheet runs and only then do I see output.

This is also true if I use a “Select in Field” OnOpen trigger to reduce my set to something like the current month.  The trigger fires after opening sheet charts are calculated with the saved state — which is generally all data.

Of course I can leverage this behavior by making sure I save my document with selections that represent a small set, and then refine using an OnOpen trigger.

-Rob

Share

The Document Analyzer Compare Tool

In my consulting practice, I’m frequently engaged to improve the performance of one or more QlikView applications.   In addition to an improved application,  I typically deliver a report of measured improvement and what was changed to achieve the improvements.

I use the Document Analyzer tool as my tuning workbench.  I’ve developed, and made available as a free download,  a companion  “Document Analyzer Compare Tool” that automates the comparison  and reporting of different Document Analyzer runs.  You can download both tools from the QlikView Cookbook Tools section.  Let’s look at the details.

Document Analyzer v3.0 introduced an option to save analysis results to a QVD.

 

When set to “Yes” a reload of DA will create a results QVD named
DocumentAnalyzerResults_appname_YYYYMMSS_hhmmss.QVD

The results QVD qill be created in the same directory as Document Analyzer.qvw.

The “Optional Analysis Title” will be included in the QVD.  Use this label to identify the phase of your tuning or developement work such as “baseline” or “final”.

The DA Compare Tool can consume and report on these result QVDs.   To load results, open DACompareTool.qvw and enter a directory path on the “Main” sheet. Press the “Load Result QVDs” button and the directory will be scanned for result QVDs.

A summary table will show what result QVDs were loaded.  Note that each version of the DACompareTool requires results from a minumum version of Document Analyzer.  DaCompareTool v1 requires Document Analyzer v3.3 or later.  Results from versions earlier than the minumum will not be loaded but will be reported in log messages.

The “Trending” sheet will display gross performance metrics for all results loaded. You can if course filter which results appears on these charts using listboxes.

The remaining sheets are the “Compare” sheets, which compare two result sets.  The compare sheets use a naming common to “diff” programs — one result is selected as the “Left” side and another as the “Right”.  Left/Right reflects their position in the comparison charts.

On any compare sheet use, the “Select Analysis” button to select two analyses.

Close the Select Analysis dialog by clicking the X in the upper right of the dialog. This will lock your Left/Right selections.

Throughout the UI,  data associcated with “Left” will be light green and “Right” will be light blue.

Now that two results have been selected, various comparisons are available.  Items associated with only one side will be highlighted in the associated color. You can compare and filter

Object Calculation times:

Expression contents:

Script Tabs and Lines:

Table Names and Sizes:

The orange color indicates that both side contain the named table, but with a different row count.

I’ve also included a “Server RAM” sheet that can be helpful in computing the RAM required for a set of QVWs.  You will still need to add in estimated cache requirements, but it’s a good starting point.

“Concurrent User Sessions” is an input field allowing you to model an estimated number of concurrent users for each document.

DACompareTool looks best if the Open Sans font is installed.  If Open Sans is not installed, you’ll see a warning on the “Introduction Sheet”.   It will still work without Open Sans, just not as pretty You can download Open Sans from google fonts.

If you do  performance tuning, I hope you find the DACompareTool useful.  Let me know in comments here or on this site’s contact form if you have suggestions or questions.

I’ll be diving deeper into Document Analyzer and the Compare Tool at my Feb 2 “Document Performance Tuning” class and in an expanded Performance Tuning session with Oleg Troyansky at the 2017 Masters Summit for Qlik.  I hope to meet you at either or both for a deep dive into the principles and practice of Qlik performance.

-Rob

 

Share