Category Archives: Performance

Optimized Load Script Patterns

While we have an increasing number of methods to create Qlik Data Models, most are still created using plain old script and we care about the elapsed runtime of the script.

Qlik QVD files load very fast when loaded in “optimized mode”. An optimized load can transfer the data from disk directly to Qlik’s internal memory format and is generally about 10x faster than a non-optimized load. A QVD is loaded optimized if:

  • No fields are added
  • No field transformation
  • No Where clause except for single parameter Exists(), or two parameter Exists() where both fields are in this Load.
  • Field Rename using AS is allowed

There are some common script requirements, e.g. filtering a QVD on a year value, that will break the optimization. What follows are some typical scenarios and script patterns to help you maintain optimized Loads.

Script patterns that let you maintain optimized Load can be:

  • Use Where Exists() to select rows. Subsetting with Join is possible but slower and uses more memory.
  • Use a subsequent Load Resident to add calculated fields.
  • AutoNumber statement
  • Touchless formatting vs inline formatting.

Selecting QVD Rows

The optimal pattern for filtering rows from a QVD is to load a temp table with values and filter the QVD load using Where Exists(field).

TempYear:
LOAD * INLINE [
Year
2022
2023
];

Data:
LOAD *
FROM [lib://data/sales.qvd] (qvd)
Where Exists(Year)

A slower option that allows for more conditions is an Inner Join.

Data:
LOAD *
FROM [lib://data/sales.qvd] (qvd);

INNER JOIN (Data) LOAD * INLINE [
Year, Region
2022, US
2023, US
];

When you want two separate tables that contain only intersecting rows with multiple fields, you can’t use Where Exists(field). Instead use the Keep prefix.

For example, Table “RepairOrders” has already been loaded. “RepairOrders” contains fields ModelId and PartId, linking to “PartsMaster.qvd”. Load matching rows from PartsMaster as a separate table, keeping the load optimized.

PartsMaster:
LEFT KEEP(RepairOrders)
LOAD *
FROM [lib://data/ParstMaster.qvd] (qvd);

Subsequent Load Resident

Create additional derived fields in a Load Resident after the optimized QVD Load.

TempSales:
LOAD *  
FROM [lib://data/sales.qvd] (qvd);

Sales:
LOAD *,
  Total - Cost as Margin
Resident TempSales;

Drop Table TempSales;

If you are certain your table contains no duplicate rows, you can create the new fields using Join instead of creating a new table:

Join (Sales) LOAD *,
  Total - Cost as Margin 
Resident Sales;

AutoNumber Statement

It’s usually a good idea to AutoNumber key fields to save memory. Don’t use the AutoNumber() function in a Load. Instead use the AutoNumber statement at the end of the script.

AutoNumber *fieldlist;

Note that you can use wildcards in the fieldname. This is really handy if your keyfields follow a naming pattern like “Key*”.

Touchless Formatting

“Touchless formatting” is my invented term for the technique of formatting fields by using a temp format table. This is useful when you need to change or set specific date or number formats for a QVD fields. If you add the formatting function to the QVD Load, you lose the optimized Load. You can read more about Touchless Formatting here.

Here is an example of changing the EU format dates in QVD fields to US format dates.

// Load some dummy fields just to assign formats
TempFormatTable:
LOAD
 Date(0, 'MM/DD/YYYY') as OrderDate,
 Date(0, 'MM/DD/YYYY') as ShipDate,
 Num(0, '#,##0.00') as OrderTotal
AutoGenerate 1;

Facts: // Load the QVD
LOAD * FROM [lib://data/OrdersEU.qvd] (qvd);

DROP TABLE TempFormatTable;  // Drop temp table

I hope you’ve found these tips a useful reference. Happy Scripting!

Share

Reducing Qlik Field Memory

Summary: A look at techniques to reduce the memory footprint of your Qlik Data Models.

There are good reasons to reduce the memory footprint of your Qlik Apps. Smaller Apps tend to perform better and require fewer server resources. Qlik Cloud imposes size quotas and there is a financial cost to exceeding the quota.

In this post we’ll look at techniques to reduce the memory requirements of your data model.

First, lets review how Qlik Sense stores App data in memory.

The Qlik Load script transforms your data into two types of internal data structures.

A Symbol Table is created for each field. The Symbol table contains a list of the distinct values of this field. The overall size of the symbol table is therefore a function of how many values and the width of the values.

Logical Tables are the tables we see created in the data model, i.e. fact and dimension tables. The Logical tables contain one row for each row of data. The tables do not contain the field values directly. Rather they contain a pointer to a value in the associated Symbol table. This pointer is sometime called an Index.

The width of a pointer is the number of bits required to index the number of values in the Symbol table. For example, in the image above we have 5 values for “Year”. This would require 3 bits as 2^3 is 8, which would cover the required 5 values. 2^2 would be too small as that would only cover 4 values.

My QSDA Pro tool shows the memory required for each field, including a breakdown by Symbol and Index.

There are other tools to view field sizes such as Qlik Cloud App Performance Analyzer, Qlik Cloud App Analyzer and App Metadata Analyzer for Qlik client-managed. In my experience all of these report only on symbol space per field, not index space.

I’m going to use QSDA Pro for my demonstrations in this article.

Using less memory for fields involves reducing the size of the symbol table, the index, or both. There are four techniques we can employ to reduce field memory:

  • Eliminate the field
  • Reduce the number of field values
  • Optimize field values to reduce width, including AutoNumber
  • Reduce the number of rows

Let’s look at each of these options.

Eliminate the field

Removing the field from the data model is the easiest way to reduce memory :). Of course this is only viable if the field is not used by your application. To determine if a field is used, you must consider all the places a field may be referenced — master items, variables, charts, bookmarks, etc — for both public and private content. Fortunately, QSDA scans all public and private content and makes the inuse determination for you.

In some cases you will not want to drop a field as you anticipate it being used in the future. The benefit from removing fields is relative to the size of the field. When considering whether you want to drop a field, look to the largest fields first where you will gain the most benefit.

QSDA Drop Unused Fields Dialog

Reduce the number of field values

Reducing the number of distinct values will reduce both symbol and index space. How might we do this?

A common scenario involves timestamps. For example let’s look at a field named “ShipTime” which displays in our app as hh:mm:ss, reporting the time an order was shipped. The analysis requirement is that occasionally an analyst will need to see this value in an order detail chart. This field has 459,918 distinct values. Something seems off. There are only 86,400 possible hh:mm:ss values per day and we only ship during one shift so I would expect to see no more than 28,800 values.

When I examine the values in the app I discover the field contains an entire Datetime value and the display format hh:mm:ss is being used in the chart. I don’t need the date portion. I’ll use the Frac() function in the script to extract just the time.

Reload followed by a new QSDA Analysis and here’s the result below. A massive reduction in distinct values and associated savings in both symbol and index. This reduces the total memory for this field from 5200KB to 345KB.

Another potential scenario for this field is that the original developer required both the date and time. In this case our approach is to split the datetime into separate Date and Time fields like this. Remember that formatting functions like Date() and Time() do not change values, we must use numeric functions like Floor() and Frac().

If I need to display the “Ship timestamp” in a chart, I’ll do it like this in the chart:

ShipDate & ' ' & ShipTime

A QSDA analysis now shows a total of 863KB for both fields. A big reduction from 5200KB!

Other potential ways to reduce the number of values:

  • Drop seconds or fractions from Time fields.
  • Some accounting systems return fractional amounts, or you may have unneeded fractions due to a calculation. Round the final amount to the penny or appropriate magnitude.

Optimize field values to reduce width

Reducing the width of the field values decreases the Symbol space but not the Index space. How can we reduce the width of a value without changing the business meaning of the value? Isn’t a value a value?

We’ll start with the easy way — AutoNumber. The AutoNumber statement (and AutoNumber function) translates values into a set of sequential integers. This is useful for compacting key values. We typically don’t need to see the key values, we just need to them to link our tables together.

AutoNumbering a field is done by adding this statement at the end of your script:

Here’s the before and after sizes for SalesOrderDetailID, showing a total reduction of 143MB.

Before AutoNumber
After AutoNumber

Note that the symbol space has been completely eliminated! When the symbol set consists of sequential integers, Qlik does not store the symbols. Instead, the index value is used as a proxy for the symbol value.

Now we need to dive a bit deeper into the actual format of a symbol table. The symbol table diagram at the top of this post is a bit of a simplification.

Generally speaking, strictly numeric values can be stored in 8 byte cells. For a field that contains only numeric values the symbol table is an array of 8 byte values.

We can index to the nth value simply by multiplying by 8. This is a very efficient and compact storage format. In QSDA these fields will show an Average Symbol Width of 8.00.

QSDA Fields

For fields that contain strings, the symbol table layout is a bit different. As strings can be of varying length, an array of fixed cells won’t do. If there are any strings in the field, the entire field is considered “mixed”.

The symbol table for a mixed field is an array of 4 byte pointers that point to symbol values elsewhere in memory. The values consists of:

  • 1 byte flag
  • The varying length string
  • 1 byte null terminator

The total symbol space for a string is 6 bytes plus the length of the string. The storage for string “a” would be 7 bytes.

Looking at the “LastName” field in the image above, we know that each symbols value carries a 6 byte overhead. We can infer that the average length of the LastName strings is 6.54. That is, 12.54 – 6.

When the value is a Dual() value, the symbol value picks up an extra 4 or 8 bytes to hold the numeric value. 4 bytes is used for integers, 8 for decimal values.

The total symbol space for a dual value is 6 bytes plus the length of the string, plus 4 or 8 bytes.

A field may contain both string and dual types. Dual is a value attribute, not a field attribute. For more on that topic see “Dual Storage vs Dual Behavior“.

Ok, we’ve just gone down a deep rabbit hole. Maybe interesting, but is this important to reducing the overall memory footprint of your app? Generally no, sometimes yes. Let’s look at some examples.

Sometimes fields that are strictly numeric get inadvertently loaded as Dual() values. Certain operations including where clauses in loads can cause this behavior. Frankly, I don’t have comprehensive understanding of all the ways this can happen. But when it does happen, we can see numeric fields having a symbol size of more than 8.

The typical way to return these fields to 8 bytes (if you can’t fix the root cause of the issue) is to wrap the field with Num() or +0 when loading.

Num(UnitPrice) as UnitPrice
UnitPrice+0 as UnitPrice

I prefer neither approach. I typically use a TempFormat table instead.

Is it worth the trouble to fix these? At first glance it may look like a big deal, but the memory savings is typically small.

The UnitCost field occupies 40MB and the symbol width should be 8 instead of 13.45. Almost all of the storage is in the Index Bytes. The field has a relatively small amount of values. Making those 2,011 values 8 bytes each would save me a total of 10967 Symbol Bytes. (27055 – (2011 * 8))

All that said there are occasions when you have a larger number of distinct values and this can make a significant difference. I almost always “fix” these fields without giving it too much thought as it’s easy to do with a tool like QSDA.

Reduce the number of rows

Reducing the number of rows that reference a field decreases the Index space, but not the Symbol space.

Wait! Are you suggesting I load less data? No, I’m talking about data model changes that will not change the business value.

A typical opportunity to reduces rows is when you have very sparse data on a large fact table. Consider an OrderDetail table that includes return information on each Order Line. Only 0.5% of Order Lines get returned, so we end up with a lot of Index space pointing to nothing. This is indicated by a low Information Density for the table field.

QSDA Table Fields

These four Return fields in the SalesOrderDetail table require 23.28MB.

By moving these fields to a new “Returns” table, linked by SalesOrderDetailID, the Return fields and the linking key now require 227KB — a savings of 23MB, a relatively significant amount.

I hope you found some useful tips in this post. If you want to learn more about how QSDA Pro can help you optimize your Qlik Apps, join me on Nov 19 for the Click Less, Qlik More – QSDA Pro webinar or reach out to us.

-Rob

Share

Reducing Qlik App Resources with QSDA

Longish post today where I show off the power of QSDA Pro to reduce Qlik App resources, saving you money and creating a more responsive experience for your users.

As a follow on to my last post “Help! My Qlik Cloud Reload Exceeds Quota!” I’m going to dive deeper into the available QSDA features to reduce memory requirements of a Qlik App. As a bonus we’ll use QSDA to improve the performance of a chart.

There are many good reasons to optimize your Qlik Apps to a leaner size. Smaller apps tend to perform better and place less demand on your server resources. If you are a Qlik Cloud customer, or planning a move to Qlik Cloud, App memory size impacts your license tier — and therefore your direct monthly costs.

In this post our starting point is I that have identified the App “BikeShop Sales” as having a large (597 MB) footprint. I got this information from Qlik App Metadata Analyzer as shown below.

Qlik App Metadata Analyzer

I could also have identified this memory footprint in QSDA Pro when we select this App from the list.

QSDA Applist

Let’s see how QSDA Pro can help. I’ll launch a new Analysis and give it the Description “Baseline”.

When the analysis completes press “View” to see the results.

QSDA Analysis Summary page

In the Resources card I see that unused fields occupy 383 MB — almost 2/3 of the base data model! Unused fields are fields that are not used anywhere in the front end. They can be removed without impacting the contents of charts or the user experience.

I also see some other suggested savings in dropping unused tables (“All fields unused but keys — 439 KB”) and a 530 KB savings by AutoNumbering keys.

A fast and simple way to remove unused fields is to drop them at the end of the script with a “DROP Fields …” statement. AutoNumbering is also best done at the end of the script. Let’s implement the QSDA recommendations. Open the Qlik Data Load Editor for “BikeShop Sales” in another browser tab.

The QSDA Tools menu provides a handy Script Generator to generate recommended script statements. Select the script generator from the Tools menu.

QSDA Tools menu

In the Script Generator > AutoNumber tab select the top checkbox to select all recommended fields. Press the “Copy” button to copy the generated AutoNumber statement to your clipboard. Paste the statements at the end of your Qlik Script. Repeat for the Drop Fields and Drop Tables tabs. (I won’t use the Load Statement tab just yet).

Now that I have updated the script with the AutoNumber and Drop statements, Reload.

Reload with AutoNumber, Drop Fields, Drop Tables

Returning to the QSDA App Card, I can see that Base App Memory is now 213 MB. Quite the savings from 507 MB!

App card after reload

At this stage, I recommend running a new analysis to confirm that the script changes have not introduced any problems or new issues.

Note that Reload Peak Memory is unchanged, because we loaded the unused data before we dropped it. Reload memory has an impact on the performance of on-prem servers and for Qlik Cloud, reload memory is a direct cost.

Can we use less reload memory by not loading the unused fields and tables? It may be that a field is used the load script but is not used in the front end, so you have to examine your script to determine. This very clean script loads directly from QVDs so the answer is clear.

I’ll start by commenting (or removing) the tables we don’t need instead of dropping them at script end. To identify the unused tables I can refer to the previously generated script, the Flags, or return to the Script Generator tool in the Baseline analysis.

What about Tables with both used and unused fields? Can I easily load just the used fields and omit the unused fields? QSDA Script Generator to the rescue again. Open the Script Generator > Load Statement in the Baseline analysis and sort on the “Unused Bytes” column. Select the SalesOrderDetail table.

A Load statement fieldlist for [SalesOrderDetail] will be created with unused fields commented out. You can exclude the unused fields entirely by unchecking the “Include Unused Fields” box. I’ll start with just the SalesOrderDetail table for now. Note that the other tables with big Unused Fields savings, Notes and Customers, are already being excluded completely.

Before reloading, I’ll remove the AutoNumber and Drop statements I added earlier. Many of those drops are no longer valid as I am not loading the fields at all. Here is the updated load script.

Updated script to exclude unused fields

After reloading, I return to the QSDA App Card. Reload Mem is reduced from 693 MB to 268 MB! I’ll run another analysis to see where things stand.

I can address the remaining unused fields with DROP Fields or using the Load Statement method. I will add the AutoNumber statements back in as well. I hope the process is clear enough now so I won’t show these remaining steps in detail here.

Let’s turn our attention to another great value of QSDA — Performance improvement. The “Calc Time Distribution” chart plots the calculation time of each chart in the App. I see I have an outlier that takes about 4 seconds to calculate.

What is this chart and why is is taking to long to calculate? Flip over to the Viz page and sort by the Calc Time column. The viz in question is a Table named “Salesperson Scorecard” on sheet “Dashboard” and it takes 4.142 seconds to calculate. Expanding the row shows the chart Dimensions and Measures and also exposes the “Viz Deconstruction” icon

Viz Details

Clicking the Viz Deconstruct icon will calculate each chart Measure in turn and tell us what is taking so long to calc in this chart. We can see the total calc time is over four seconds and measures three and four take considerably longer than the first two. But why?

Let’s focus on this chart to get some insight into the cause and solution of the poor performance. Back on the Viz page, Right-click the Viz of interest and select “Filter to Selected”. This will filter our pages to only those objects — Dimensions, Expressions, etc — associated with this Viz.

Click over to the Expressions page and we’ll see that our long running expressions have flags. Flags are the advice system of QSDA, identifying errors, bad practices and performance issues. Our two second expression “Longest Deal” has a “Key Field used in Expression” flag.

Clicking the “i” icon next to the flag will open the Flag Details panel which contains a link to the help article for this flag.

In the help article we’re informed that using a key field in an expression can sometimes cause poor performance. The article goes on to advise using a non-key field instead, creating a new field if necessary. I’ll take this advice and create a new field in the SalesOrderHeader table:

Reload and change the chart Measure to use the new field.

  Max(Aggr(Sum(LineSales), SalesOrderRecord))

Run a new analysis to check the calc time of the chart (and check we didn’t impact anything else!). In the summary we’ll see a significant decrease in total app calc time and we can drill into our chart to see what’s changed.

Whoa! That measure has gone from 2 seconds to 0.008 seconds.

But Rob, didn’t I just blow up my RAM by making a copy of this big field? Don’t guess, use QSDA to see exactly what the cost is.

Additional RAM of SalesOrderRecord field

The cost is 50 KB of RAM and no significant increase in Reload Mem. Was it worth it? I think so.

If you chose to put the new field on the SalesOrderDetail table instead of SalesOrderHeader, the QSDA results will quickly show you that this was a bad idea. RAM is much increased and performance not so good.

QSDA Pro gives you the tools to:

  • Reduce your App memory and compute requirements.
  • Focused advice and insight for improving your App.
  • Compare different solutions and measure the impact of change.
  • Track your progress and document the impact of your work.

I hope this post makes it clear why I say that QSDA Pro is “the tool that every Qlik developer needs”.

Do you have questions about QSDA Pro or want to schedule a demo for your team? Reach out to us.

If you’re attending Qlik Connect in Orlando, stop by the Motio booth on the show floor. I’ll be there (guy in the hat) to answer your questions and show you even more features of QSDA Pro that can help you create leaner, cleaner and faster Qlik Apps.

-Rob

Share

Help! My Qlik Cloud Reload Exceeds Quota!

If you are a Qlik Cloud (SaaS) customer you may have seen this dreaded message during script reload. Ack! Ack!

Your quota may be different than 25Mb, but the problem remains the same. How do I modify this application to fit within my subscribed quota?

QSDA Pro V3.1 to the the rescue! QSDA analyzes your app and provides recommendations and easy to use tooling to drop unused data and optimize the remaining data, potentially greatly reducing the size of your Qlik App.

QSDA needs a successfully loaded app for analysis, so our first step is to reload the app using the “Limited load” feature of the Qlik script debugger.

Click the Debug button in the Qlik script editor, check the Limited load option, accept the default of 100 rows and click “Run” to initiate the reload. The reload will come in under your quota. When the reload completes, key Ctrl-s to persist the data.

You don’t have a business-ready app, but this is enough for QSDA to scan all charts, expressions and master items to determine what data is actually required.

In QSDA Pro we now select our Cloud connection and select the app of interest, in this case “BikeShop Sales2”. Click the “Open” button to prepare the app for analysis. I will also uncheck the “Collect Calctime” option because chart calculation times are not meaningful with our limited data. Description is optional, in this case I’ve entered “Limited Load”.

Click the “New Analysis” button to launch the QSDA Pro analysis. The analysis may take a minute or two to complete depending on the capacity of your tenant and the current Qlik Cloud load. When the analysis is complete press the “View” button to see the results.

In the Summary view we see there are 89 unused fields. Unused fields are fields that are not used in any charts, master items or expressions. These are fields that can be dropped at the end of script without impacting the user experience in the app. The actual storage numbers (KiB) are not meaningful because we only loaded 100 rows. The bar chart is useful in that is indicates about 2/3 of our data is unused. Hope!

QSDA provides a Script Generator in the Tools menu to generate a “DROP Fields …” script statement for unused fields. Select the script generator from the Tools menu.

In the Script Generator > Drop Fields tab select the top checkbox to select all recommended fields. Press the “Copy” button to copy the generated Drop Fields statement to your clipboard.

Paste the Drop Fields statement at the end of your Qlik Script and reload.

Reload successful! Victory!

A subsequent QSDA analysis shows the app size has been reduced from the problematic 37Mb to 15MB!

QSDA Pro can quickly and efficiently get your Qlik Apps below your Qlik Cloud quota by pruning unnecessary data. Don’t pay for more than you need.

In a follow up post I’ll walk through some additional QSDA features to help you reduce Qlik app footprint even further and improve chart response time to boot.

Download QSDA Pro and try it for yourself. Or reach out to us to learn more about license options or arrange a demo or POC for your team.

-Rob

Are you going to Qlik Connect? I’ll be at the Motio booth on the show floor ready to demo QSDA Pro or answer any questions you may have. Or just to say Hej 🙂

Share

Mind the Concat() sort-weight

Summary: While looking into long expressions I noticed that the optional sort-weight argument has an impact on the distinctness of Concat(distinct…). Incorrect use of sort-weight can generate bloated expressions containing redundant code.

In my work tuning Qlik Apps I sometimes encounter very long expressions. An expression many thousands of characters long can be difficult to debug or comprehend the expression goal. To help in working with long expressions I’ve added an Expression histogram and an Expression Decoder feature to my QSDA Pro product. (These features are currently in beta, generally available in early Feb).

I’ve noted expressions of length greater than 50k across apps from different customers. What did these expressions have in common that made them so large?

  • They used the Concat() function in $() to dynamically generate a part of the expression.
  • They used the optional sort-weight argument of Concat() incorrectly.
  • They were much bigger than necessary — sometimes 100x — but the expanded expression worked as intended.

In the process of reviewing the expressions I learned something surprising. As a reminder here’s the syntax of the Concat function:

Concat({[SetExpression] [DISTINCT] [TOTAL []]} string[, delimiter[, sort_weight]])

We use the DISTINCT keyword to return the unique set of values from the string argument (usually a field). The Qlik documentation for DISTINCT says:

If the word DISTINCT occurs before the function arguments, duplicates resulting from the evaluation of the function arguments are disregarded.

https://help.qlik.com/en-US/sense/November2022/Subsystems/Hub/Content/Sense_Hub/ChartFunctions/StringAggregationFunctions/concat.htm

This means that the set of distinct values is the combinations of string and sort_weight (if used). Let me demonstrate with an example. Here’s a sample data table.

For the expression: Concat(Dim, ',') we receive output "a,b,c,c,c“.

Adding the DISTINCT keyword: Concat(DISTINCT Dim, ',') we now get “a,b,c“.

Adding a non-distinct sort-weight argument: Concat(DISTINCT Dim, ',', RecId) we now get "a,b,c,c,c” again. More output than I expected. It’s a distinct list of the combinations of Dim and RecId.

Adding a distinct sort-weight argument: Concat(DISTINCT Dim, ',', Weight) we now get "a,b,c“.

How about if we used an unlinked data island field for sort-weight? The Island field has two values.

Concat(DISTINCT Dim, ',', IslandField) returns "a,b,c,a,b,c“. Item count is the product of Dim * IslandField values. Remember this for later.

Ok, this is all very interesting but the behavior is super obvious and I would notice it if it came up in my App. What’s this got to do with ginormous expressions?

Developers sometimes use Concat along with Dollar Sign Expansion (DSE) to generate dynamic expression fragments. For example to ignore all fields from several tables in a set modifier:

Sum ({<
$(='[' & concat({<$Table={'Table1', 'Table2', 'Table3'}>}$Field,']=,[') & ']=')
>} Value)

Sometimes $(=Concat(...)) is used to build the list inside a Pick() or Match(). These type of expressions frequently have awkward syntax including lots of “& chr(39) &” type stuff. Inevitably the expression gets built by copying and modifying an expression from elsewhere in the App. An expression that contains a sort-weight. A sort-weight that doesn’t get removed. It may be an island field or a field that has a many to one relationship. The result is an expanded expression that works but is larger than it needs to be. No one notices (unless they use QSDA Pro) because it’s the expanded expression.

As a simple example, suppose the “ignore filter” expression above was supposed to generate something like "Quarter=,Month=,Year=“. If I inadvertently use a sort-weight field that has 100 distinct values the result will be repeated 100 times. The expression would still work but it would be 100x larger than necessary.

I recently found an example where Concat was used to generate an If() function from data (very clever) that should have had 15 branches. But an unrelated sort-weight field of 95 values resulted in 1425 branches! It “worked” but did a lot of unnecessary calculation.

If you are a solo developer or working in a small team you may never encounter this issue. But if you are a consultant or maintaining legacy Apps you may stumble across it. I’ve been playing with ways to flag this condition in QSDA Pro. QSDA already flags data island expressions . I’m testing creating a new flag specifically for Concat().

My colleague Oleg Troyansky uses QSDA in his Performance Tuning session at the Masters Summit for Qlik. Live events will return in Fall 2023 but in meantime you can attend individual workshops on-line during February through April. More information and schedule here.

Happy Qliking
-Rob

Share

Expression as Left Side of Set Modifier

Can I use an expression as the “fieldname” in a Set Modifier? In “<x={y}>” can “x” be an expression?

I always believed the answer to be “No”, x may only be a field name. That understanding is reinforced by what I read in the help doc for “set modifier”.

Yesterday I was surprised to discover that in Qlik Sense, this is a valid expression that returns sales for year 2015.

sum({<"=Year(OrderDate)"={2015}>}Sales)

This also works:

sum({<"=left(Country)"={'M'}>}Sales)

This is all news to me. And very interesting.

I stumbled across this accidentally when using the Expression Editor > Set Analysis > Insert button. I had selections in the derived field OrderDate.autocalendar.Year field. The set generated by the Insert tool was:

{<"=Dual(Year([OrderDate]),YearStart([OrderDate]))"={'2014','2015'}>}

That expression is the derived field definition that was specified in the script.

I have not yet formulated an opinion as to whether this is useful, or if there are any cautions or limitations when using. I’m at the curious stage at this point and will look into it more as well as read comments I am sure will follow.

-Rob

Share

How to Segment QVD Files

Summary: In this post I discuss when you may want to segment or “shard” QVD files and demonstrate some script patterns for producing and consuming segmented QVDs.

I recently received a question from a colleague who knew that I had done a lot of work with Qlik QVD files. He asked, “what’s the optimal size for a QVD file?”

My response was that in my experience there is no optimal physical size, but in many cases there are reasons to break a large QVD into multiple, smaller files. Dividing a large file into smaller files is called “segmenting” or “sharding”.

People generally start to think about segmenting QVDs when they encounter resource constraints while updating a QVD with an incremental load. In an incremental load, only new or changed rows are extracted from the database and merged with an existing QVD. This involves reading and writing the entire large QVD which can use significant time and I/O resources. This also means that the process takes increasingly longer as time marches on. Not pleasant.

Other reasons you may want to segment are consumption and archival patterns. It’s common to use the “latest n periods” of data in your active dashboard, for example the current 12 months. If the data is in a single large QVD, you have to roll off (delete) data older than 12 months. You can filter as you load, but that becomes an increasingly wasteful activity over time.

It’s likely that you will want to retain the older data for use in special projects or historical dashboards.

Given the example scenario above, it would make sense to create one QVD for each month. This will provide predictable performance for incremental updates as well as the dashboard load. Older data could be kept forever and any set of months could be accessed efficiently.

How do we do perform this magic segmentation? Let’s assume an example QVD with these fields:

TransactionID: Primary key
TransactionDate: Original date of transaction
LastUpdate: Timestamp of last update to this row. Transactions may receive updates up to 60 days after creation.
other…: other fields such as Amount, Customer, etc

We want to create one QVD per month using the name “Transactions-YYYY-MM.qvd”. What determines which QVD a transaction is placed in? Is it the MonthStart(TransactionDate)? It depends…

The simplest technique is for the extract script to place everything loaded today into the current month QVD, regardless of the TransactionDate. The QVD name is assigned to a variable in the script using:

Let vQvdName = 'Transactions-' & Date(Today(1),'YYYY-MM') & '.qvd';

When later loading 12 QVDs into the dashboard, load front (most current QVD) to back with the clause:

Where not Exists(TransactionID)

The Where clause will ensure that only the most current row for that TransactionID will be loaded.

This simple technique might be ok for most scenarios. But it’s not very robust because it falls down when you want to do something like a full reload to add columns, or data is loaded off schedule. It also would require that if want to load something like 6 months from the middle, we have to be careful to include enough later QVDs to cover possible updates.

A more robust approach would be to store each transaction row in the QVD corresponding with it’s TransactionDate. Here is one script pattern to do just that. Our starting point for this script is that we have already extracted the new and changed rows to create table “Transactions”.

Step #1 is to collect the month values into a temp table:

TempMonths:
LOAD Distinct
MonthStart(TransactionDate) as TranMonth
Resident Transactions; 

Next we process each TranMonth value in a loop block. The block will build a temp table of rows for just one month and merge with any existing QVD.

For i = 1 to FieldValueCount('TranMonth')
Let vMonthName = Date(FieldValue('TranMonth', $(i)), 'YYYY-MM');
Set vQvdName = Transactions-$(vMonthName).qvd;


MonthTransactions:
NoConcatenate LOAD * Resident Transactions
Where MonthStart(TransactionDate) = FieldValue('TranMonth', $(i));


If FileSize('$(vQvdName)') > 0 THEN // If we have existing QVD
LOAD * From [$(vQvdName)] (qvd)
Where Not Exists(TransactionID);
ENDIF


Store MonthTransactions into [$(vQvdName)] (qvd);
Drop Table MonthTransactions;
Next i

Drop Table TempMonths, Transactions; 

The above segmenting script would support incremental reload, full reload or a load of any data in between.

So now we have many “Transactions-YYYY-MM.qvd” files. How do we load the current 12 months? Do we wake up early on the 1st of each month and quick change the script? No. We create a dynamic script based off the current day.

For i = 0 to -11 step -1  // 12 Months
Let vMonthName = Date(AddMonths(Today(1), $(i)), 'YYYY-MM');
Transactions:
LOAD *
From [Transactions-$(vMonthName).qvd] (qvd);
Next i 

If we had built the QVDs using any technique that allowed for the possibility of duplicate TransactionID, we would add a guard of “Where not Exists()”.

...
From [Transactions-$(vMonthName).qvd] (qvd)
Where not Exists(TransactionID); 

What About IntraDay High Volume Reloads?

In scenarios with Intraday loading and high transaction counts, I prefer to defer merging QVDs to off-peak times.

Let’s take an example scenario of a customer who generates approximately 10 million transactions per day, with peak hours creating about 2 million transactions. The main dashboard should be refreshed hourly for twelve hours each day and should contain the last 10 days of transactions. Of course all data should be kept around for various summary analyses and ad-hoc projects.

It makes sense to segment these QVDs by day. Our hourly incremental load will need to merge with — read and write — a fairly large daily QVD. Crucially, the load time gets longer as the day progresses and the QVD gets larger. And now I hear rumors of twice hourly reload. This pattern has a bad smell.

What to do? Let’s store the hourly incremental rows in a hourly QVD of their own. The dashboard will then pick up all hourly QVDs plus required daily QVDs. Overnight, when we have some breathing room, we will run a script to consolidate the hourlies into a daily QVD.

The hourly incremental QVD is created like:

Let vQvdName = 'Hourly-Transactions-' & Timestamp(Now(1), 'YYYY-MM-DD-hh-mm-ss') & '.qvd'; 
Store Transactions into [$(vQvdName)] (qvd); 

Then the dashboard will load the new data using a wildcard load for the Hourly QVDs and a loop for the prior days:

// Load all Hourly QVDs
Load * From [Hourly-Transactions-*.qvd] (qvd);
// Load previous 9 days of Daily QVDs
For i = 1 to 9 // 9 Days
Let vDateName = Date((Today(1) -$(i)), 'YYYY-MM-DD');
Transactions:
LOAD * From [Transactions-$(vDateName).qvd] (qvd);
Next i 

Getting LastUpdate From a QVD

One of the steps in incremental loading is determining what “zzz” value to use in the SQL “Where LastUpdate >= zzz”. We need the high value from the last load. Some people store this information in a side file or variable. I think the most reliable approach is to get the high value from the existing QVD.

Getting Max(LastUpdate) from a very large QVD can take some time (how to do this the quickest is always an interesting pub question). My preferred technique is to store a new field “MaxLastUpdate” in the QVD and then read only the first row of the QVD to retrieve the value.

Getting and Joining Max(LastUpdate) should be fairly quick because we are only dealing with the incremental rows.

Transactions:
SQL Select * From db.transactions where LastUpdate >= foo;
Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 

The lastest MaxLastUpdate value can then be retrieved by reading only the first row of the existing QVD. Here’s how it looks all together using the example of monthly QVDs.

Let vMonthName = Date(Today(1), 'YYYY-MM');
TempMonth:
First 1 Load MaxLastUpdate
From [Transactions-$(vMonthName).qvd] (qvd);
Let vMaxLastUpdate = TimeStamp(Peek('MaxLastUpdate'), 'MM-DD-YYYY hh:mm:ss');
Drop Table TempMonth;


Transactions:
SQL Select * From db.transactiions
where LastUpdate >= '$(vMaxLastUpdate)'; 


Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 


// Merge or segment with existing QVDs

I hope you found some useful tips in this article. No doubt you have some ideas of your own, feel free to add comments.

Want to learn more advanced scripting techniques? After 2 years of virtual sessions, the Masters Summit for Qlik is back with live events this fall. In September we’ll be in Madrid, Spain, and in November we’ll be in New Orleans, USA. If you want to take your Qlik skills to the next level, get access to all sorts of ready-to-use solutions and reference materials, share experiences and maybe a few horror stories with your peers then we hope to see you there!

Share

If() Tips

Summary:  I offer some tips for writing better performing and easier to maintain syntax when using the Qlik If() function. 

The Qlik If() function is very powerful and  frequently appears in Qlik Sense and QlikView apps.

Expressions using multiple If() functions can easily get out of hand and become difficult to maintain or debug, as well as poor performers.

In this post I’ll offer some advice on avoiding If() pitfalls and tips to write easier to understand expressions.

The Qlik syntax diagram for the If function is:

if(condition , then [, else])

That’s perfectly clear to most people, but I prefer to think of it more like:

if(condition , true result [, false result])

Tip#1: If() does not short circuit.

Both the true & false branches are calculated even when only one is possibly true.  For example:

If(Only(Currency = 'LC',  Sum(Sales), Sum ([Sales LC])

In this case both Sum() expressions will be calculated even though only one value will be utilized.  In most cases this behavior is not of concern and in many applications will perform very well.   However, a nested If() with many possible branches or a large data set may perform poorly.

For more on the short circuit issue see “How to Choose an Expression“.

 

Tip#2: Use indentation sparingly.

The true or false result may be an additional, “nested” If(), which is where we start to see some ugly syntax.  Following traditional programming conventions many people automatically indent the nested if like this:

If(Sum(Sales) > 100000, 'Large',
    If(Sum(Sales) > 75000, 'Med', 
      If(Sum(Sales) > 50000, 'Demi',  'Small')
    )
)

Essentially,  the expression above classifies into one of four values.  I don’t think indentation  adds to the readability and indentation will lead you into “tab hell” when you get many possibilities.  I prefer to write this expression as:

If(Sum(Sales) > 100000, 'Large'
,If(Sum(Sales) > 75000, 'Med' 
,If(Sum(Sales) > 50000, 'Demi'
,'Small'
)))

No indentation, all the closing right parens collected on one line at the end. Makes it very easy in the expression editor to see that you have the right number of parens.

The leading (vs trailing) commas are my personal preference.  This make it easier to comment out logic and in my view, the comma belongs to the If that follows it, not the preceding If.

I think the above syntax makes it very easy to understand that I am choosing  one of four results, and what the rule is for each result.  Syntactically each If() is the else parameter of the preceding If().  I don’t think of the Ifs as “combined”, rather as “sequential”.

Do indent when you are using If() as the then parameter,  as shown in Tip#4 below.

 

Tip#3: Simplify by testing from high to low. 

The business rule that created this sample expression may have been stated to the Qlik developer like this:

“Classify sales of 0 to 50 000 as “Small”, 50 001 to 75 000 as “Demi”, 75 001 to 100 000 as “Med” and above 100 000 as “Large”.

The developer may faithfully translate the requirement into this expression.

If(Sum(Sales) > 0 and sum(Sales) <= 50000, 'Small'
,If(Sum(Sales) > 50000 and Sum(Sales) <= 75000, 'Demi', 
,If(Sum(Sales) > 75000 and <= 100000, 'Med'
,'Large'
)))

This returns the correct result. Testing from low to high values forces the use of “and” which makes the expression more complex than necessary and potentially slower to execute.  In my experience, testing from high to low, as in the Tip#2 example, yields a cleaner syntax.

 

Tip#4: Use “and” when you mean and.

Here’s a sample expression requirement:

When Sales > 1000 and Region=’US’, it’s “Mega US”. When Sales > 750 and Region = ‘UK’, it’s “Mega UK”. Otherwise it’s “General”.

I have seen this written as:

If(Sum(Sales) > 1000, 
    If(Region = 'US', 'Mega US'),
If(Sum(Sales) > 750, 
    If(Region = 'UK', 'Mega UK'), 
'General')

While the “and” requirement may be satisfied with a then-if  nesting, I find it clearer with the “and” keyword.

If(Sum(Sales) > 1000 and Region = 'US', 'Mega US'
,If(Sum(Sales) > 750 and Region = 'UK', 'Mega UK' 
,'General'
))

What if the requirement for  both US & UK were 1000?  You could argue that this is clear case for nesting in that there is a shared  condition and perhaps it would be a good practice to not repeat ourselves on the Sum(Sales).

If(Sum(Sales) > 1000, 
    If(Region = 'US', 'Mega US',
    If(Region = 'UK', 'Mega UK'), 'General'), 
'General')

Notice  we needed to repeat the ‘General’ result to cover the null case.  So it’s not super clean, but it may be worth it to not repeat the sum(Sales) calculation.  Generally I find the performance difference between “and” and “nested if” to be insignificant and tend to favor whatever is the clearer syntax for the given requirement.

What about Pick(Match())? 

I’ve heard it occasionally claimed that a Pick/Match combination will run faster than a nested If.   The expression might look like this:

Pick(
    Match(
      -1
      ,Region= 'US' and Sum(Sales) > 1000
      ,Region= 'UK' and Sum(Sales) > 1000
      , -1
    )
,'Mega US', 'Mega UK','General')

In my own testing and reading I’ve never found any performance advantage to Pick/Match.  That said, sometimes the syntax is appealing.

One thing I don’t like about Pick/Match is the distance between the condition list  and the result list. It’s fairly easy to get the lists  mis-aligned as the expression grows.

I  wish Qlik had a Switch type function like:

Switch (
  condition1 : result1
  [,condition2 : result2, ...]  
  [: defaultResult]
)

 

Tip#5: Simplify by using Column(n) or Measure Name

If your if() refers to something that has already been calculated in the chart, you can use the Column(n) function to refer to the value of a measure/expression column. For example, in a color expression:

If(Column(2) > 0, Green(), Red())

This can be much neater than repeating the expression text and typically runs faster as well.

If you are on Qlik Sense May 2021 you can use Master Measure names in the expression like:

If([Total Sales] > 0, Green(), Red())

[Total Sales] need not be a measure in this chart.

Both QlikView and Qlik Sense also allow you to reference the Label of a measure/expression column in the chart. In most versions the syntax checker will declare this an error even though it calculates correctly. I tend to avoid the label technique due to this confusion.

 

Tip#6: Don’t use If() as a chart filter

Use If when you want to dynamically select from two or more alternatives.  If should not be used simply to filter data like this:

Sum(If(Region = 'EU' and CYTDFlag = 1, Sales)

Filtering is best done with Set Analysis. The same expression written with a Set:

Sum({<Region={'EU'}, CYTDFlag={1}>} Sales)

Set Analysis is much faster than If.  If you are new to Set Analysis, you might initially find the syntax more challenging than If.  But SA  is much more powerful than If and well worth mastering.

 

Tip#7:  Consider the other conditional functions. 

Alt() and Coalesce() can be a more compact and elegant approach to testing for nulls. Instead of:

If(IsNull(SalesRep), Manager, SalesRep)

use:

Coalesce(SalesRep, Manager)
// If you want to consider empty and 
// blank strings as Null:
Coalesce(EmptyIsNull(Trim(SalesRep)), Manager)

When testing against a list of values,  instead of multiple If() or “or”, use the Match() or WildMatch() functions instead.

If (Match(StateCode, 'VA', 'TN', 'FL', 'GA'), 'South',  'Other')

 

I hope you find these tips useful.  You can use my QSDA Pro tool to quickly filter and examine all the uses of the If() function in a Qlik Sense App, located on-prem or in SaaS.

-Rob

 

Share

CubeTester

When working on Qlik Sense performance issues I frequently find I want to measure the performance of specific expressions. I might want to know how variations of an expression may perform against each other.  In a slow chart with many measures I want calculation time individually for each measure to focus my efforts.  Or perhaps I’m just satisfying a general curiosity or trying to settle a bet.

You can measure the performance of expression variations by modifying the chart and measuring the overall chart response time with something like Chrome Add Sense or QS Document Analyzer.  That approach can get kind of clunky especially when you are focused on a subset of measures in the chart.

I prefer a more structured approach to testing expressions. The tool I reach for is CubeTester.

CubeTester is an open source Nodejs command line tool for testing the performance of Qlik HyperCubes (Dimensions and Measures).  The test specification is written in a json file as either a HyperCubeDef or the “simplified” Dimension/Measure syntax.

Here’s a sample test written in simplified syntax that tests three variations of a cube (chart) containing one Dimension and three Measures.

I’ll run  CubeTester specifying the file that holds this test:

node index.js test tests/columns.json

And receive this output:

There is no significant difference in performance between the variations. Importantly, I can also see that all three return identical  total values as well.

CubeTester supports two commands:

  • test : Run tests.
  • extract: Extract app charts into a test file.

There are a number of options that can be specified on the command line or in the test definition. See the readme for more information on available options.

in addition to testing variations or trying out a theory, here are some other cases where I’ve used CubeTester.

  • When working with a mashup where my HyperCube exists only in code, there is no chart to test.
  • In a slow rendering chart I can test individual measures, combinations of measures and non-data expressions (like color expressions) to find the culprit.

Using CubeTester I can quickly try out ideas and document my progress as I work through an issue. I’ve made some interesting discoveries!

Some notes:

  • Testing against a server uses certificates for authentication.  (Pull request welcome if you want more auth options).
  • Make sure you specify “wss” when using a server endpoint eg
    wss://your.server:4747
  • You’ll need to test with enough data to get calculation times of sufficient magnitude.  Two results of 5 milliseconds vs 7 milliseconds are not precise enough to draw conclusions from.
  • Calculation time is affected by the capacity of the target machine and what else is running.  I recommend to repeat tests until you see a stable pattern.  Use the –repeat option and take the lowest result from each repeat.

CubeTester is free to use. Have fun!

-Rob

 

Share

Creating Temporary Script Associations

Summary: I review using Join, Lookup() and ApplyMap() as script techniques  to calculate using fields from multiple tables. I ultimately recommend ApplyMap().

Qlik charts  can calculate values on the fly, using fields from multiple tables in the model.  The associative model takes care of navigating (joining) the correct fields together.  Our expression syntax doesn’t  identify what table a field exists in — the associative logic takes care of this detail for us.

We may want to calculate a measure, for example, “Net Amount”, applying a business rule requiring fields from several tables:

Our expression to calculate “Net Amount” might look like this:

Sum(Amount) - 
RangeSum(
  Sum(Quantity * UnitCost), 
  Sum(Amount * Discount), 
  Sum(Amount * SalesTaxRate), 
  Sum(Amount * ExciseTaxRate)
)

There may be cases (such as performance) where we want to pre-calculate “Net Amount” as a new field in the script.  In script, we don’t have the magic associative logic to assemble the fields.  When a script expression is used to create a new field, all fields must be available  in a single load statement.  This is straightforward when all the required fields are in the same table.  But what do we do when the fields come from multiple tables?

Here are three approaches to solving the problem of calculating a new field using multiple tables in script.

  • Join
  • Lookup() function
  • ApplyMap() function

I’ll demonstrate deriving the same “Net Amount” calculation in the script.

JOIN

The Join option will require us to execute multiple joins to assemble the fields onto each Orders row and then finally do the calculation.  The script might look like this:

Left Join (Orders)
LOAD
 ProductId,
 UnitCost
Resident Products
; 
Left Join (Orders)
LOAD
 CustomerId,
 Discount,
 State
Resident Customers
; 
Left Join (Orders)
LOAD
 State,
 SalesTaxRate,
 ExciseTaxRate
Resident States
;

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * UnitCost,
   Amount * Discount,
   Amount * SalesTaxRate,
   Amount * ExciseTaxRate
 ) as NetAmount
Resident Orders
;
// Drop the extra fields from Orders.
Drop Fields State, UnitCost, Discount, SalesTaxRate,ExciseTaxRate
From Orders
;

It’s a fairly good option.  It can be a lot of code depending on how many fields and tables we need to traverse. We need to be aware of “how many hops” between tables and may require intermediate joins (State field) to get to the final field (SalesTaxRate & ExciseTaxRate).

When using Join we need to ensure we have no duplicate keys that would mistakenly generate additional rows.

LOOKUP

Lookup() seems the most natural to me. It’s the least amount of code and it even sounds right: “look-up”.  It’s a one-to-one operation so there is no danger of generating extra rows.

It’s my least used option due to performance as we shall see.

Lookup takes four parameters  – a field to return, the field to test for a match, a match value to search for and the table to search.  Using Lookup() our script will look like this:

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * Lookup('UnitCost', 'ProductId', ProductId, 'Products'),
   Amount * Lookup('Discount', 'CustomerId', CustomerId, 'Customers'),
   Amount * Lookup('SalesTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States'),
   Amount * Lookup('ExciseTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States')
 ) as NetAmount
Resident Orders
;

Note that for SalesTaxRate and ExciseTaxRate, the third parameter — the match value — is another Lookup() to retrieve the State. This is how we handle  multiple hops, by nesting Lookup().

It’s a nice clean statement that follows a simple pattern.  It performs adequately with small volumes of data.

Lookup does have a significant performance trap in that it uses a scan  to find a matching value.  How long to find a value is therefore dependent on where in the field the value is matched.  If it’s the first value it’s very quick, the 1000th value much longer, the 2000th value exactly twice as long as the 1000th. It’s a bit crazy making that it executes in O(n) time, for which I prefer the notation U(gh).

APPLYMAP

I like to think of the ApplyMap() approach as an optimized form of Lookup().  We first build mapping tables for each field we want to reference and then use ApplyMap() instead of Lookup() in the final statement. Our script will look like this:

Map_ProductId_UnitCost:
Mapping
Load ProductId, UnitCost
Resident Products
;
Map_CustomerId_Discount:
Mapping
Load CustomerId, Discount
Resident Customers
;
Map_CustomerId_State:
Mapping 
Load CustomerId, State
Resident Customers
;
Map_State_SalesTaxRate:
Mapping 
Load State, SalesTaxRate
Resident States
;
Map_State_ExciseTaxRate:
Mapping 
Load State, ExciseTaxRate
Resident States
;
NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * ApplyMap('Map_ProductId_UnitCost', ProductId, 0),
   Amount * ApplyMap('Map_CustomerId_Discount', CustomerId, 0),
   Amount * ApplyMap('Map_State_SalesTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)),
   Amount * ApplyMap('Map_State_ExciseTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0))
 ) as NetAmount
Resident Orders
;

The mapping setup can be a lot of code depending on how many fields are involved. But it’s well structured and clean.

In the final statement, we are “looking up” the value using ApplyMap() and it performs very quickly.  ApplyMap uses a hashed lookup so it does not matter where in the list the value lies, all values perform equally.

We can re-structure and simplify the mapping setup and subsequent use with a subroutine like this:

Sub MapField(keyField, valueField, table)
// Create mapping table and set vValueField var // equal to ApplyMap() string.
 [Map_$(keyField)_$(valueField)]:
 Mapping Load [$(keyField)], [$(valueField)]
 Resident $(table);
 Set [v$(valueField)] = ApplyMap('Map_$(keyField)_$(valueField)', [$(keyField)]);
End Sub

Call MapField('ProductId', 'UnitCost', 'Products')
Call MapField('CustomerId', 'Discount', 'Customers')
Call MapField('CustomerId', 'State', 'Customers')
Call MapField('State', 'SalesTaxRate', 'States')
Call MapField('State', 'ExciseTaxRate', 'States')

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
 Quantity * $(vUnitCost),
 Amount * $(vDiscount),
 Amount * $(vSalesTaxRate),
 Amount * $(vExciseTaxRate)
 ) as NetAmount
;
LOAD
 *,
 $(vState) as State
Resident Orders
;

Note the use of the preceding load to handle the nested lookup of State.   You could also modify the Sub to handle some level of nesting as well.

I typically use the mapping approach as I find it always gives accurate results (with Join you must be careful of duplicate keys) and generally performs the best, and importantly, consistently.

Whether you are new to Qlik or an old hand I hope you found something useful in reading this far.

-Rob

 

 

Share