Category Archives: Scripting

Optimized Load Script Patterns

While we have an increasing number of methods to create Qlik Data Models, most are still created using plain old script and we care about the elapsed runtime of the script.

Qlik QVD files load very fast when loaded in “optimized mode”. An optimized load can transfer the data from disk directly to Qlik’s internal memory format and is generally about 10x faster than a non-optimized load. A QVD is loaded optimized if:

  • No fields are added
  • No field transformation
  • No Where clause except for single parameter Exists(), or two parameter Exists() where both fields are in this Load.
  • Field Rename using AS is allowed

There are some common script requirements, e.g. filtering a QVD on a year value, that will break the optimization. What follows are some typical scenarios and script patterns to help you maintain optimized Loads.

Script patterns that let you maintain optimized Load can be:

  • Use Where Exists() to select rows. Subsetting with Join is possible but slower and uses more memory.
  • Use a subsequent Load Resident to add calculated fields.
  • AutoNumber statement
  • Touchless formatting vs inline formatting.

Selecting QVD Rows

The optimal pattern for filtering rows from a QVD is to load a temp table with values and filter the QVD load using Where Exists(field).

TempYear:
LOAD * INLINE [
Year
2022
2023
];

Data:
LOAD *
FROM [lib://data/sales.qvd] (qvd)
Where Exists(Year)

A slower option that allows for more conditions is an Inner Join.

Data:
LOAD *
FROM [lib://data/sales.qvd] (qvd);

INNER JOIN (Data) LOAD * INLINE [
Year, Region
2022, US
2023, US
];

When you want two separate tables that contain only intersecting rows with multiple fields, you can’t use Where Exists(field). Instead use the Keep prefix.

For example, Table “RepairOrders” has already been loaded. “RepairOrders” contains fields ModelId and PartId, linking to “PartsMaster.qvd”. Load matching rows from PartsMaster as a separate table, keeping the load optimized.

PartsMaster:
LEFT KEEP(RepairOrders)
LOAD *
FROM [lib://data/ParstMaster.qvd] (qvd);

Subsequent Load Resident

Create additional derived fields in a Load Resident after the optimized QVD Load.

TempSales:
LOAD *  
FROM [lib://data/sales.qvd] (qvd);

Sales:
LOAD *,
  Total - Cost as Margin
Resident TempSales;

Drop Table TempSales;

If you are certain your table contains no duplicate rows, you can create the new fields using Join instead of creating a new table:

Join (Sales) LOAD *,
  Total - Cost as Margin 
Resident Sales;

AutoNumber Statement

It’s usually a good idea to AutoNumber key fields to save memory. Don’t use the AutoNumber() function in a Load. Instead use the AutoNumber statement at the end of the script.

AutoNumber *fieldlist;

Note that you can use wildcards in the fieldname. This is really handy if your keyfields follow a naming pattern like “Key*”.

Touchless Formatting

“Touchless formatting” is my invented term for the technique of formatting fields by using a temp format table. This is useful when you need to change or set specific date or number formats for a QVD fields. If you add the formatting function to the QVD Load, you lose the optimized Load. You can read more about Touchless Formatting here.

Here is an example of changing the EU format dates in QVD fields to US format dates.

// Load some dummy fields just to assign formats
TempFormatTable:
LOAD
 Date(0, 'MM/DD/YYYY') as OrderDate,
 Date(0, 'MM/DD/YYYY') as ShipDate,
 Num(0, '#,##0.00') as OrderTotal
AutoGenerate 1;

Facts: // Load the QVD
LOAD * FROM [lib://data/OrdersEU.qvd] (qvd);

DROP TABLE TempFormatTable;  // Drop temp table

I hope you’ve found these tips a useful reference. Happy Scripting!

Share

Regular Expressions in Qlik Sense

Summary: A rich set of regular expression functions has recently been added in Qlik Sense cloud and client managed. In this post I’ll introduce some use cases for this feature and show how to instantly become a regular expression master.

Regular Expressions (RegEx) is a powerful string pattern matching language that has been available in many programming and scripting languages for decades. Qlik Sense Cloud and the client-managed May 2025 release brings the full power of RegEx to Qlik via a new set of Chart and Load Script functions.

What can you do with RegEx?

  • Validate that a field value matches a specific pattern. e.g. Is this a valid phone number or credit card?
  • Extract substrings from a larger string.
  • Reformat a string to match your rules.

Here’s an example to get us started. I’ll validate that “Phone” values match the allowed format for US phone numbers.

The string ‘^(?:([2-9]\d{2})\ ?|[2-9]\d{2}(?:-?|\ ?))[2-9]\d{2}[- ]?\d{4}$’ is a regular expression pattern that represents the validation rule for US phone numbers. Like any powerful and syntactically terse programming language, RegEx patterns can look a little daunting at first. Here’s the explanation of what that pattern enforces.

This regular expression for US phone numbers conforms to NANP A-digit and D-digit requirements (ANN-DNN-NNNN). Area Codes 001-199 are not permitted; Central Office Codes 001-199 are not permitted. Format validation accepts 10-digits without delimiters, optional parens on area code, and optional spaces or dashes between area code, central office code and station code. Acceptable formats include 2225551212, 222 555 1212, 222-555-1212, (222) 555 1212, (222) 555-1212, etc.

Did I write that pattern myself? No, I copied it off an internet site. There are many internet repositories with thousands of patterns you can copy from. Those repositories have quickly been supplanted by AI chatbots.

Let’s look at another example. I recently answered a question on Qlik Community where the poster wanted to extract First & Last names from strings like:

James SmithMary JohnsonJohn WilliamsPatricia BrownRobert Jones-GarciaMichael MillerLinda Davis

The Qlik function ExtractRegEx is the function we want to use. I know enough RegEx off the top of my head to create a pattern to split on the capital letters and spaces. But how do I deal with the hyphen in “Jones-Garcia”? It would make my head hurt.

AI is a great tool for generating RegEx patterns! Turn to your AI chatbot of choice. I use Copilot but I expect ChatGPT or any of the popular bots would do the job as well.

Asking Copilot to generate the pattern

The nice thing about this approach is I don’t have to explicitly state “..and handle hyphens”, I just supply sample data.

Is this pattern correct? You could just plug it into Qlik but I prefer to first validate using a RegEx tester. There are a number of free tools, regex101.com is the one I usually use.

After validating and potentially tweaking the pattern we use it in Qlik script like this:

I do recommend you spend 30 minutes learning the basics of RegEx to make best use of assistants and testing tools.

You may have used the Qlik script SubField function to split a string in multiple rows based on a delimiter such as a comma. But SubField won’t do when comma is part of a quoted string as is common in CSV files. The new SubFieldRegEx function handles this with ease.

Don’t quit if RegEx only gets you 90% of the way there. Stack some Qlik functions like the where clause above to finish the job.

RegEx implementations in other languages use an option flag to indicate a case-insensitive operation. Qlik has opted to use an additional function with the suffix “l” to indicate a case-insensitive operation. So:

  • MatchRegEx – case sensitive
  • MatchRegExl – case insensitive

A really nice feature of the Qlik MatchRegEx function is that it allows for multiple match patterns to be specified.

The Qlik RegEx functions are documented in the String Functions section. I do wish they had given the functions a separate section. I’ll summarize the available functions here with the caveat that these are correct at this time of this posting.

  • ExtractRegEx() extracts text from an input string expression using the specified regular expression pattern. The function returns a null value if no matches are found.
  • IndexRegEx() searches the input string and returns the starting position of the nth occurrence of the specified regular expression pattern.
  • MatchRegEx() compares the input string with one or more specified regular expression patterns, and returns the numeric location of the regular expression patterns that match.
  • ReplaceRegEx() returns a string after replacing one or more matches between an input string and a specified regular expression pattern.
  • SubFieldRegEx() extracts text from an input string expression, using the specified regular expression pattern as a delimiter. 

There are a number of useful optional parameters, see the documentation for details. There is also ReplaceRegExGroup that I have not yet wrapped my head around. Feel free to post a use case in the comments.

Are you going to use these functions a lot? Probably not. But when you need to clean messy data or extract relevant data these functions can be invaluable. I’ll leave you with one last example, extracting data within HTML tags.

-Rob

Share

Section Access Secret

I was recently helping a colleague debug a problem where Section Access in a Qlik app was not working correctly. It was not allowing access to listed users. Amongst other script he had an Inline load that looked something like this:

I pointed to the values of the USERID field and said “There’s your problem. The USERID values have to be in upper case. All values in a section access table have to be in upper case. It’s in the Doc”.

“No…” he replied, and proceeded to show me other Apps that used lower case USERID values and did work correctly. I looked at a few of my own apps and found some with lower case values, working correctly.

Why did the problem App fail while others worked? What was different? The difference is that in the problem App, the Load Inline occurred in Section Application and was subsequently Loaded Resident in Section Access. So what? I would have never found the cause but for a Qlik Community post from 2012.

When Load Inline is used within the script Section Access “section”, all data is automatically transformed to upper case.

Whoa! Double whoa!

Load sources other than Inline are not transformed in Section Access. And the magic transformation doesn’t happen in Section Application. I’ve never seen this behavior in the Qlik Doc.

Did everyone know this but me? Is it documented somewhere?

-Rob

Share

Load Statement – where to add field?

Summary: I explain the mystery of why sometimes script fieldname changes flow through to expressions and sometimes not.

When adding a new field to an existing Qlik Script Load statement, where do you insert the new field in the field list? Top, bottom, alphabetically?

Where you insert the new field can break existing chart expressions — if you also rename fields using “as”. Let’s look at an example. We have an existing KPI using the expression “Sum(Sales)”.

If we modify our script to rename field [Sales] to [Total Sales]

Qlik will “automagically” update the field name in our KPI expression. Great!

What if we make the script change like this?

Our KPI is broken! No automagic! Why?

The System fields $Table, $Field, $FieldNo provide us a clue. “Sales” is the nth ordinal field in table “Orders”.

Automagic fix-up depends on the $FieldNo remaining constant. Inserting or deleting a field in the Load statement may change the $FieldNo. That’s when things break.

To avoid breaking existing expressions, follow these practices:

  •  Don’t rename and restructure script in the same reload. A limited load of one row is enough to do rename fix-ups.
  •  If in doubt, add new fields to the end of your LOAD statement.  
  • Use QSDA Pro to audit for broken measures.

-Rob

Still time to register for the Masters Summit for Qlik Cloud in Vienna 30 Sept – 02 Oct. We’ll be training on all things Qlik Cloud as well as favorite topics in scripting, set analysis, performance tuning and integration. See the agenda here.

Share

Using Attributes in QVD Files

Here’s a fun Friday post.

As I was updating my incremental load materials for the upcoming October/November Masters Summit for Qlik, I ran across an old note about creating “Attributes” in QVDs using Tags. Due to a bug in QVD generation I couldn’t implement the idea at the time. But the bug is long since fixed so here I present the general idea of “QVD Attributes” (my name, not an official Qlik thing).

We can assign Tags to Qlik Tables using the script TAG statement:

Tag TABLE TranData with 'MaxId=$(vMaxId)';
Tag TABLE TranData with 'Source=Salesforce';

If the table “TranData” is subsequently written to a QVD using the STORE statement, the tag will be included in the QVD file.

We can read tags directly from a QVD with script:

LOAD String%Table as Tag
FROM [lib://mylib/TranData.qvd]
(XmlSimple, table is [QvdTableHeader/TableTags/String]);

Note that we are reading only the XML header from the beginning of the file. This is very fast.

If we use a structured tag format like “key=value” we can create a generalized “Attribute” store in a QVD that can tell us something about the contents of that QVD. That could be useful when we want to use the QVD in a subsequent script.

How do we retrieve these tags in a useful and consistent way? I think loading them as variables would be a good approach. We can create a reusable subroutine to load all “key=value” tags from a QVD and generate variables named “Tag:key”. We then call the Sub, using the path to any QVD:

Call LoadTags('lib://QVDFiles/TranData.qvd');

We might utilize the new variables in something like an incremental Load.

SQL SELECT * From Trans Where Id > $(Tag:MaxId);

Or as a control to conditionally execute a script block.

If '$(Tag:Source)' = 'Salesforce' Then
// Use the Salesforce Load statement
EndIf

What is all this stuff? Subroutines? Preceding Load? Incremental Load? Tags? Join me at the Masters Summit for Qlik in Dublin or Orlando where we will teach you all about these topics and make you a scripting Master.

At the bottom of this post is a complete sample including the subroutine. You only need to change the “QVDFiles” to your lib to run this on your system.

-Rob

// Subroutine to Load Tagname=Value pairs from a QVD as "Tag:Tagname" Variables.
Sub LoadTags(_path)
    _TempTags:
    Load
        SubField(Tag, '=', 1) as TagName,
        SubField(Tag, '=', 2) as TagValue 
    ;    
    LOAD String%Table as Tag
    FROM [$(_path)] 
    (XmlSimple, table is [QvdTableHeader/TableTags/String])
    Where String%Table like '*=*';
    
    For _i = 0 to NoOfRows('_TempTags') - 1
        Let _vname = 'Tag:' & Peek('TagName', $(_i), '_TempTags');
        Let [$(_vname)] = Peek('TagValue', $(_i), '_TempTags');
    Next _i
    Set _vname=;
    Set _i=;
    Drop Table _TempTags;
End Sub
// END of Subroutine


// Generate some sample data
TranData:
LOAD  
	RecNo()*2 as Id,
	Rand() as Amount,
    'A' as Customer
AutoGenerate 6;

// Something we may want to record from the data is the max Id value, for later incremental load. 
TempId:
LOAD Max(Id) as MaxId Resident TranData;
Let vMaxId = Peek('MaxId');
Drop Table TempId;

// We can tag with $(variable) values or literal values. 
Tag TABLE TranData with 'MaxId=$(vMaxId)';
Tag TABLE TranData with 'Source=Salesforce';
// Some other tag that is not part of our name=value scheme.
Tag TABLE TranData with "ABC";
// STORE and DROP the QVD
Store TranData Into [lib://QVDFiles/TranData.qvd] (qvd);
Drop Table TranData;


// Call our subroutine to load tags as variables, from the disk QVD.
Call LoadTags('lib://QVDFiles/TranData.qvd');

// Demonstrate how we might use these tag values.
// In a SQL Where clause.
SET select = SQL SELECT * From Trans Where Id > $(Tag:MaxId);
Trace $(select);

// In a conditional to select a block of script.
If '$(Tag:Source)' = 'Salesforce' Then
	Trace Going to use the Salesforce load statement;
EndIf
Share

Creating Time Groups in Your Data Model

I frequently create Date groups in my Qlik apps to facilitate selections like “Yesterday” or “Last Week”. Some years ago I published a pattern for creating Date groups in the Recipes section of QlikCookbook.com and it’s been a very popular download. I’ve responded to a few requests on Qlik Community looking to do the same thing for Time groups, so I decided to publish a recipe for Time Grouping as well.

The recipe demonstrates selecting ranges such as “Day Shift”.

Selecting by Time Group

How do we create ranges that map to our data? In the script, create a Range table that defines range Names, Start and End times.

Ranges:
LOAD
Range as Range,
Time#(RangeStart, 'hh:mm:ss') as RangeStart,
Time#(RangeEnd, 'hh:mm:ss') as RangeEnd
INLINE [
Range, RangeStart, RangeEnd
Midnight Shift, 00:00:00, 05:00:00
Early Morning, 05:00:01, 09:00:00
Daylight, 06:00:00, 17:59:59
Day Shift, 09:00:01, 17:00:00
Early Evening, 17:00:01, 20:00:00
Evening, 20:00:01, 23:59:59
12am-6am, 00:00:00, 05:59:59
6am-12pm, 06:00:00, 11:59:59
12pm-6pm, 12:00:00, 17:59:59
6pm-12am, 18:00:00, 23:59:59
];

Then use IntervalMatch to link the time field in our data (EventTime in this example) with the Range field.

JOIN (Ranges) IntervalMatch (EventTime) LOAD RangeStart, RangeEnd RESIDENT Ranges;
// No longer need RangeStart, RangeEnd, but may keep them for documentation or debugging.
DROP FIELDS RangeStart, RangeEnd;

There are additional notes provided in the downloadable example, but that’s it for the basic pattern. You can download the example from here Time Grouping

You may have noticed I used Time#() to read the text time values. That makes it easy to write the range values as readable hh:mm:ss.

If you are generating the ranges using another technique like Interval#() or division, be mindful of rounding. A Qlik time value is represented numerically by the fraction of 1, a whole day. 0.5 is 12:00:00 PM, halfway through the day.

There are four methods I’m aware of to generate a Qlik time. For example to generate the numeric time value corresponding to 12:00:05 AM:

Time#('12:00:05', 'hh:mm:ss')
Interval#('12:00:05', 'hh:mm:ss')
MakeTime(0,0,5)
5/86400

All four will generate a floating point number to represent the time. Will they produce exactly the same result? Results that would pass an equality test?

Time#() = Interval#() = MakeTime() = x/y?

For some inputs they are equivalent. For a significant number of inputs the odd man out is Interval#(). Given the 86,400 second values in a day, Interval#() will not match the others 36% of the time. Try sharing that interesting fact with your spouse at the breakfast table.

Join me at the Masters Summit for Qlik in Dublin or Orlando this fall to learn advanced scripting techniques for Qlik as well as inside tips. Our team of expert consultants will cover data modeling, set analysis, visualization techniques and Qlik integration as well as answer your “how do I” questions. Register now or get more information at the Masters Summit for Qlik website.

Happy Scripting
-Rob

Share

Roll Your Own Qlik “Connector”

Summary: I demonstrate how to use a simple Python REST server to wrap any data source for loading by the Qlik REST connector.

In my years of Qlik development I have many times needed to load data from a source not supported by available connectors. Sometimes the data resides on a factory floor controller and can only be accessed using something like telnet. One approach is to create a batch process that extracts the data to a flat file accessible to Qlik. This works, but you have to deal with issues of timing and failure monitoring.

If you want real time direct loading from Qlik script, you can write a custom connector for your data source. Writing a custom connector is a programming chore of medium complexity and there are limited languages you can work in.

As an alternative to a custom connector or batch process, I’m going to show how you can use the out-of-the-box Qlik REST connector to access your special data, wherever it lives!

REST is a popular way of exchanging data between systems. Data is requested from a REST Endpoint using a URL like:

http://localhost:3000/customers

Data is returned in JSON format like:

[{"Name": "Abc corp", "City": "New York"},
{"Name": "GeoFarm": "City": "Boston"}]

The Qlik REST connector translates JSON data into the rows and fields of a Qlik table. This URL request + JSON output makes it very easy to test REST endpoints using a browser.

Due to REST’s popularity, the tools we need to create REST endpoints are available in just about all programming languages, platforms and lowcode services. Use whatever language you are familiar with or that runs in your environment. I’m going to show a quick example using Python. If you want to create this example yourself, make sure you have Python 3+ installed along with the packages Flask and Pandas.

For my “special data source”, I’m going to use output from the Windows “Tasklist” command. The output looks like below.

Our REST Endpoint must provide three components:

  1. A web server that will handle the incoming URL.
  2. Data from our special/custom data source.
  3. A conversion of data from our native format to JSON.

Here is a complete Python program that satisfies all three requirements.

from flask import Flask, json
from io import StringIO
import pandas as pd
api = Flask(__name__)

@api.route('/tasklist', methods=['GET'])
def get_tasklist():
    import subprocess
    subprocess = subprocess.Popen("tasklist /FO CSV", shell=True, stdout=subprocess.PIPE, text=True)
    csvStringIO = StringIO(subprocess.stdout.read())
    df = pd.read_csv(csvStringIO);
    return Response(df.to_json(orient='records'), content_type='application/json')
if __name__ == '__main__':
    api.run(port=3000, host='0.0.0.0') 

Save this code in file “SimpleRest.py” and start the python task with the command:

python SimpleRest.py

In a Qlik App, create a new REST connection. The URL will be our new REST endpoint. If your Python task is running on other than “localhost”, change the host in the URL.

Give it a Name. Test and Save.

Use the “Select data” button on the connection and select the “root” table. The Qlik wizard will build the required script.

Reload and you should have data!

Let’s break down the pieces of the python program to understand what you would change and what you would re-use in a new scenario.

The first four lines import required libraries and initialize Flask, the REST library we will use. Here is the remaining code.

@api.route('/tasklist', methods=['GET'])
def get_tasklist():
    import subprocess
    subprocess = subprocess.Popen("tasklist /FO CSV", shell=True, stdout=subprocess.PIPE, text=True)
    csvStringIO = StringIO(subprocess.stdout.read())
    df = pd.read_csv(csvStringIO);
    return Response(df.to_json(orient='records'), content_type='application/json')
if __name__ == '__main__':
    api.run(port=3000, host='0.0.0.0') 

The bolded lines above start a webserver listening on port 3000 and declare what code will run when the URL path is “/tasklist”.

@api.route('/tasklist', methods=['GET'])
def get_tasklist():
    import subprocess
    subprocess = subprocess.Popen("tasklist /FO CSV", shell=True, stdout=subprocess.PIPE, text=True)
    csvStringIO = StringIO(subprocess.stdout.read())
    df = pd.read_csv(csvStringIO);
    return Response(df.to_json(orient='records'), content_type='application/json')
if __name__ == '__main__':
    api.run(port=3000, host='0.0.0.0') 

These two lines above run the tasklist /FO CSV command and capture the output. To make wrangling easier I added tasklist options “/FO CSV” which generates output in CSV format.

@api.route('/tasklist', methods=['GET'])
def get_tasklist():
    import subprocess
    subprocess = subprocess.Popen("tasklist /FO CSV", shell=True, stdout=subprocess.PIPE, text=True)
    csvStringIO = StringIO(subprocess.stdout.read())
    df = pd.read_csv(csvStringIO);
    return Response(df.to_json(orient='records'), content_type='application/json')
if __name__ == '__main__':
    api.run(port=3000, host='0.0.0.0') 

Lastly we wrangle the output from CSV to JSON format and send the data to the requestor.

I hope you can see from this example how easy it can be to expose your custom data to Qlik and have real time loading. I used Python for this example, but you could implement this same example in many languages with little effort.

if you’ve created my example in your own environment, you should now be able to create a treemap chart of Memory Usage. Tip: parse the [Mem Usage] field in the script with

Num#(SubField([Mem Usage], ' ', 1), '#,##0') * 1024 as [Mem Usage]

Want to talk more data loading techniques? Join me at the Masters Summit for Qlik in Fall 2022 where in addition to heads down learning, we’ll be kicking the tires on things like Server Side Extensions and Qlik Application Automation.

-Rob

Share

How to Segment QVD Files

Summary: In this post I discuss when you may want to segment or “shard” QVD files and demonstrate some script patterns for producing and consuming segmented QVDs.

I recently received a question from a colleague who knew that I had done a lot of work with Qlik QVD files. He asked, “what’s the optimal size for a QVD file?”

My response was that in my experience there is no optimal physical size, but in many cases there are reasons to break a large QVD into multiple, smaller files. Dividing a large file into smaller files is called “segmenting” or “sharding”.

People generally start to think about segmenting QVDs when they encounter resource constraints while updating a QVD with an incremental load. In an incremental load, only new or changed rows are extracted from the database and merged with an existing QVD. This involves reading and writing the entire large QVD which can use significant time and I/O resources. This also means that the process takes increasingly longer as time marches on. Not pleasant.

Other reasons you may want to segment are consumption and archival patterns. It’s common to use the “latest n periods” of data in your active dashboard, for example the current 12 months. If the data is in a single large QVD, you have to roll off (delete) data older than 12 months. You can filter as you load, but that becomes an increasingly wasteful activity over time.

It’s likely that you will want to retain the older data for use in special projects or historical dashboards.

Given the example scenario above, it would make sense to create one QVD for each month. This will provide predictable performance for incremental updates as well as the dashboard load. Older data could be kept forever and any set of months could be accessed efficiently.

How do we do perform this magic segmentation? Let’s assume an example QVD with these fields:

TransactionID: Primary key
TransactionDate: Original date of transaction
LastUpdate: Timestamp of last update to this row. Transactions may receive updates up to 60 days after creation.
other…: other fields such as Amount, Customer, etc

We want to create one QVD per month using the name “Transactions-YYYY-MM.qvd”. What determines which QVD a transaction is placed in? Is it the MonthStart(TransactionDate)? It depends…

The simplest technique is for the extract script to place everything loaded today into the current month QVD, regardless of the TransactionDate. The QVD name is assigned to a variable in the script using:

Let vQvdName = 'Transactions-' & Date(Today(1),'YYYY-MM') & '.qvd';

When later loading 12 QVDs into the dashboard, load front (most current QVD) to back with the clause:

Where not Exists(TransactionID)

The Where clause will ensure that only the most current row for that TransactionID will be loaded.

This simple technique might be ok for most scenarios. But it’s not very robust because it falls down when you want to do something like a full reload to add columns, or data is loaded off schedule. It also would require that if want to load something like 6 months from the middle, we have to be careful to include enough later QVDs to cover possible updates.

A more robust approach would be to store each transaction row in the QVD corresponding with it’s TransactionDate. Here is one script pattern to do just that. Our starting point for this script is that we have already extracted the new and changed rows to create table “Transactions”.

Step #1 is to collect the month values into a temp table:

TempMonths:
LOAD Distinct
MonthStart(TransactionDate) as TranMonth
Resident Transactions; 

Next we process each TranMonth value in a loop block. The block will build a temp table of rows for just one month and merge with any existing QVD.

For i = 1 to FieldValueCount('TranMonth')
Let vMonthName = Date(FieldValue('TranMonth', $(i)), 'YYYY-MM');
Set vQvdName = Transactions-$(vMonthName).qvd;


MonthTransactions:
NoConcatenate LOAD * Resident Transactions
Where MonthStart(TransactionDate) = FieldValue('TranMonth', $(i));


If FileSize('$(vQvdName)') > 0 THEN // If we have existing QVD
LOAD * From [$(vQvdName)] (qvd)
Where Not Exists(TransactionID);
ENDIF


Store MonthTransactions into [$(vQvdName)] (qvd);
Drop Table MonthTransactions;
Next i

Drop Table TempMonths, Transactions; 

The above segmenting script would support incremental reload, full reload or a load of any data in between.

So now we have many “Transactions-YYYY-MM.qvd” files. How do we load the current 12 months? Do we wake up early on the 1st of each month and quick change the script? No. We create a dynamic script based off the current day.

For i = 0 to -11 step -1  // 12 Months
Let vMonthName = Date(AddMonths(Today(1), $(i)), 'YYYY-MM');
Transactions:
LOAD *
From [Transactions-$(vMonthName).qvd] (qvd);
Next i 

If we had built the QVDs using any technique that allowed for the possibility of duplicate TransactionID, we would add a guard of “Where not Exists()”.

...
From [Transactions-$(vMonthName).qvd] (qvd)
Where not Exists(TransactionID); 

What About IntraDay High Volume Reloads?

In scenarios with Intraday loading and high transaction counts, I prefer to defer merging QVDs to off-peak times.

Let’s take an example scenario of a customer who generates approximately 10 million transactions per day, with peak hours creating about 2 million transactions. The main dashboard should be refreshed hourly for twelve hours each day and should contain the last 10 days of transactions. Of course all data should be kept around for various summary analyses and ad-hoc projects.

It makes sense to segment these QVDs by day. Our hourly incremental load will need to merge with — read and write — a fairly large daily QVD. Crucially, the load time gets longer as the day progresses and the QVD gets larger. And now I hear rumors of twice hourly reload. This pattern has a bad smell.

What to do? Let’s store the hourly incremental rows in a hourly QVD of their own. The dashboard will then pick up all hourly QVDs plus required daily QVDs. Overnight, when we have some breathing room, we will run a script to consolidate the hourlies into a daily QVD.

The hourly incremental QVD is created like:

Let vQvdName = 'Hourly-Transactions-' & Timestamp(Now(1), 'YYYY-MM-DD-hh-mm-ss') & '.qvd'; 
Store Transactions into [$(vQvdName)] (qvd); 

Then the dashboard will load the new data using a wildcard load for the Hourly QVDs and a loop for the prior days:

// Load all Hourly QVDs
Load * From [Hourly-Transactions-*.qvd] (qvd);
// Load previous 9 days of Daily QVDs
For i = 1 to 9 // 9 Days
Let vDateName = Date((Today(1) -$(i)), 'YYYY-MM-DD');
Transactions:
LOAD * From [Transactions-$(vDateName).qvd] (qvd);
Next i 

Getting LastUpdate From a QVD

One of the steps in incremental loading is determining what “zzz” value to use in the SQL “Where LastUpdate >= zzz”. We need the high value from the last load. Some people store this information in a side file or variable. I think the most reliable approach is to get the high value from the existing QVD.

Getting Max(LastUpdate) from a very large QVD can take some time (how to do this the quickest is always an interesting pub question). My preferred technique is to store a new field “MaxLastUpdate” in the QVD and then read only the first row of the QVD to retrieve the value.

Getting and Joining Max(LastUpdate) should be fairly quick because we are only dealing with the incremental rows.

Transactions:
SQL Select * From db.transactions where LastUpdate >= foo;
Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 

The lastest MaxLastUpdate value can then be retrieved by reading only the first row of the existing QVD. Here’s how it looks all together using the example of monthly QVDs.

Let vMonthName = Date(Today(1), 'YYYY-MM');
TempMonth:
First 1 Load MaxLastUpdate
From [Transactions-$(vMonthName).qvd] (qvd);
Let vMaxLastUpdate = TimeStamp(Peek('MaxLastUpdate'), 'MM-DD-YYYY hh:mm:ss');
Drop Table TempMonth;


Transactions:
SQL Select * From db.transactiions
where LastUpdate >= '$(vMaxLastUpdate)'; 


Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 


// Merge or segment with existing QVDs

I hope you found some useful tips in this article. No doubt you have some ideas of your own, feel free to add comments.

Want to learn more advanced scripting techniques? After 2 years of virtual sessions, the Masters Summit for Qlik is back with live events this fall. In September we’ll be in Madrid, Spain, and in November we’ll be in New Orleans, USA. If you want to take your Qlik skills to the next level, get access to all sorts of ready-to-use solutions and reference materials, share experiences and maybe a few horror stories with your peers then we hope to see you there!

Share

Can I Update a Mapping Table?

Summary: Can you concatenate rows to a Mapping Table? The answer is “Yes” and I show you how.

Recently my Masters Summit colleague Oleg Troyanksy asked me “What are the limits on updating an existing Mapping Table in Qlik script?”.  My immediate answer was “You can’t update a Mapping Table after it’s been created”.  Then Oleg showed me a script that surprised me — adding to a mapping table in a loop,  implicit concatenation.  The loop works, but what if we need to load from multiple sources?

When the contents of a mapping table come from multiple Load statements, I have always advised doing standard loads to a temp table and then a final Mapping Load from the temp table.

Turns out you can concatenate to a mapping table.  Maybe that temp table is unnecessary.  I don’t find any doc on the topic. Here’s what I’ve found from experimentation.

We may wish to do something like this:

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Concatenate (MapX)
Mapping Load * Inline [
 from, to
 3, C
 ];

That script will fail with  “Table ‘Mapx’ not found” error. You cannot directly reference a mapping table this way.

Interestingly, if we leave off the “Concatenate (MapX)”, it will concatenate and result in the desired mapping.  Implicit concatenation will kick-in and the second load will add rows to the mapping table.  I’ve included some ApplyMap() code in this example so you can copy/paste and test for yourself.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Mapping Load * Inline [
 from, to
 3, C
 ];

Data:
 Load RecNo() as Recid,
 ApplyMap('MapX', RecNo()) as Mapped
 AutoGenerate 5;

The resulting output looks like this, proving that 3/C has been added to the map.

Unlike implicit concatenation for standard tables, the fieldnames  need not be the same. This script will concatenate. Note the different fieldnames in the second load.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Mapping Load * Inline [
 X, Y
 3, C
 ];

What if there is an intervening standard load?  Will concatenation occur?

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Fact:
 LOAD 1 as X AutoGenerate 1; 

Mapping Load * Inline [
 X, Y
 3, C
 ];

The answer is no, concatenation will not happen. The second Mapping Load will create a new invisible mapping table.

So if I  can’t name the mapping table in a Concatenate prefix, is there some other way to explicitly concatenate?  Turns out there is. This will work.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];

Fact:
 LOAD 1 as X AutoGenerate 1;

MapX:
 Mapping Load * Inline [
 X, Y
 3, C
 ];

When naming the mapping table again with the same label. explicit concatenation will occur!  This is unlike a standard load where repeating a label results in  a new table with “-1” suffix  (when fieldnames don’t match).

In summary, you can add rows to a mapping table. Repeating the table label again will ensure you are adding,  whether there is an intervening standard load or not.

Now you may be wondering …can I do an implied Join? I think not.

On Jan 20, 2021 I’ll be sharing more scripting tips and techniques in my “Advanced Scripting” on-line course as part of the “Masters Summit at Home” series.   Join me there for more on topics like creating reusable script and maintaining optimized loads.

Share

Creating Temporary Script Associations

Summary: I review using Join, Lookup() and ApplyMap() as script techniques  to calculate using fields from multiple tables. I ultimately recommend ApplyMap().

Qlik charts  can calculate values on the fly, using fields from multiple tables in the model.  The associative model takes care of navigating (joining) the correct fields together.  Our expression syntax doesn’t  identify what table a field exists in — the associative logic takes care of this detail for us.

We may want to calculate a measure, for example, “Net Amount”, applying a business rule requiring fields from several tables:

Our expression to calculate “Net Amount” might look like this:

Sum(Amount) - 
RangeSum(
  Sum(Quantity * UnitCost), 
  Sum(Amount * Discount), 
  Sum(Amount * SalesTaxRate), 
  Sum(Amount * ExciseTaxRate)
)

There may be cases (such as performance) where we want to pre-calculate “Net Amount” as a new field in the script.  In script, we don’t have the magic associative logic to assemble the fields.  When a script expression is used to create a new field, all fields must be available  in a single load statement.  This is straightforward when all the required fields are in the same table.  But what do we do when the fields come from multiple tables?

Here are three approaches to solving the problem of calculating a new field using multiple tables in script.

  • Join
  • Lookup() function
  • ApplyMap() function

I’ll demonstrate deriving the same “Net Amount” calculation in the script.

JOIN

The Join option will require us to execute multiple joins to assemble the fields onto each Orders row and then finally do the calculation.  The script might look like this:

Left Join (Orders)
LOAD
 ProductId,
 UnitCost
Resident Products
; 
Left Join (Orders)
LOAD
 CustomerId,
 Discount,
 State
Resident Customers
; 
Left Join (Orders)
LOAD
 State,
 SalesTaxRate,
 ExciseTaxRate
Resident States
;

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * UnitCost,
   Amount * Discount,
   Amount * SalesTaxRate,
   Amount * ExciseTaxRate
 ) as NetAmount
Resident Orders
;
// Drop the extra fields from Orders.
Drop Fields State, UnitCost, Discount, SalesTaxRate,ExciseTaxRate
From Orders
;

It’s a fairly good option.  It can be a lot of code depending on how many fields and tables we need to traverse. We need to be aware of “how many hops” between tables and may require intermediate joins (State field) to get to the final field (SalesTaxRate & ExciseTaxRate).

When using Join we need to ensure we have no duplicate keys that would mistakenly generate additional rows.

LOOKUP

Lookup() seems the most natural to me. It’s the least amount of code and it even sounds right: “look-up”.  It’s a one-to-one operation so there is no danger of generating extra rows.

It’s my least used option due to performance as we shall see.

Lookup takes four parameters  – a field to return, the field to test for a match, a match value to search for and the table to search.  Using Lookup() our script will look like this:

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * Lookup('UnitCost', 'ProductId', ProductId, 'Products'),
   Amount * Lookup('Discount', 'CustomerId', CustomerId, 'Customers'),
   Amount * Lookup('SalesTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States'),
   Amount * Lookup('ExciseTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States')
 ) as NetAmount
Resident Orders
;

Note that for SalesTaxRate and ExciseTaxRate, the third parameter — the match value — is another Lookup() to retrieve the State. This is how we handle  multiple hops, by nesting Lookup().

It’s a nice clean statement that follows a simple pattern.  It performs adequately with small volumes of data.

Lookup does have a significant performance trap in that it uses a scan  to find a matching value.  How long to find a value is therefore dependent on where in the field the value is matched.  If it’s the first value it’s very quick, the 1000th value much longer, the 2000th value exactly twice as long as the 1000th. It’s a bit crazy making that it executes in O(n) time, for which I prefer the notation U(gh).

APPLYMAP

I like to think of the ApplyMap() approach as an optimized form of Lookup().  We first build mapping tables for each field we want to reference and then use ApplyMap() instead of Lookup() in the final statement. Our script will look like this:

Map_ProductId_UnitCost:
Mapping
Load ProductId, UnitCost
Resident Products
;
Map_CustomerId_Discount:
Mapping
Load CustomerId, Discount
Resident Customers
;
Map_CustomerId_State:
Mapping 
Load CustomerId, State
Resident Customers
;
Map_State_SalesTaxRate:
Mapping 
Load State, SalesTaxRate
Resident States
;
Map_State_ExciseTaxRate:
Mapping 
Load State, ExciseTaxRate
Resident States
;
NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * ApplyMap('Map_ProductId_UnitCost', ProductId, 0),
   Amount * ApplyMap('Map_CustomerId_Discount', CustomerId, 0),
   Amount * ApplyMap('Map_State_SalesTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)),
   Amount * ApplyMap('Map_State_ExciseTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0))
 ) as NetAmount
Resident Orders
;

The mapping setup can be a lot of code depending on how many fields are involved. But it’s well structured and clean.

In the final statement, we are “looking up” the value using ApplyMap() and it performs very quickly.  ApplyMap uses a hashed lookup so it does not matter where in the list the value lies, all values perform equally.

We can re-structure and simplify the mapping setup and subsequent use with a subroutine like this:

Sub MapField(keyField, valueField, table)
// Create mapping table and set vValueField var // equal to ApplyMap() string.
 [Map_$(keyField)_$(valueField)]:
 Mapping Load [$(keyField)], [$(valueField)]
 Resident $(table);
 Set [v$(valueField)] = ApplyMap('Map_$(keyField)_$(valueField)', [$(keyField)]);
End Sub

Call MapField('ProductId', 'UnitCost', 'Products')
Call MapField('CustomerId', 'Discount', 'Customers')
Call MapField('CustomerId', 'State', 'Customers')
Call MapField('State', 'SalesTaxRate', 'States')
Call MapField('State', 'ExciseTaxRate', 'States')

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
 Quantity * $(vUnitCost),
 Amount * $(vDiscount),
 Amount * $(vSalesTaxRate),
 Amount * $(vExciseTaxRate)
 ) as NetAmount
;
LOAD
 *,
 $(vState) as State
Resident Orders
;

Note the use of the preceding load to handle the nested lookup of State.   You could also modify the Sub to handle some level of nesting as well.

I typically use the mapping approach as I find it always gives accurate results (with Join you must be careful of duplicate keys) and generally performs the best, and importantly, consistently.

Whether you are new to Qlik or an old hand I hope you found something useful in reading this far.

-Rob

 

 

Share