Creating Temporary Script Associations

Summary: I review using Join, Lookup() and ApplyMap() as script techniques  to calculate using fields from multiple tables. I ultimately recommend ApplyMap().

Qlik charts  can calculate values on the fly, using fields from multiple tables in the model.  The associative model takes care of navigating (joining) the correct fields together.  Our expression syntax doesn’t  identify what table a field exists in — the associative logic takes care of this detail for us.

We may want to calculate a measure, for example, “Net Amount”, applying a business rule requiring fields from several tables:

Our expression to calculate “Net Amount” might look like this:

Sum(Amount) - 
RangeSum(
  Sum(Quantity * UnitCost), 
  Sum(Amount * Discount), 
  Sum(Amount * SalesTaxRate), 
  Sum(Amount * ExciseTaxRate)
)

There may be cases (such as performance) where we want to pre-calculate “Net Amount” as a new field in the script.  In script, we don’t have the magic associative logic to assemble the fields.  When a script expression is used to create a new field, all fields must be available  in a single load statement.  This is straightforward when all the required fields are in the same table.  But what do we do when the fields come from multiple tables?

Here are three approaches to solving the problem of calculating a new field using multiple tables in script.

  • Join
  • Lookup() function
  • ApplyMap() function

I’ll demonstrate deriving the same “Net Amount” calculation in the script.

JOIN

The Join option will require us to execute multiple joins to assemble the fields onto each Orders row and then finally do the calculation.  The script might look like this:

Left Join (Orders)
LOAD
 ProductId,
 UnitCost
Resident Products
; 
Left Join (Orders)
LOAD
 CustomerId,
 Discount,
 State
Resident Customers
; 
Left Join (Orders)
LOAD
 State,
 SalesTaxRate,
 ExciseTaxRate
Resident States
;

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * UnitCost,
   Amount * Discount,
   Amount * SalesTaxRate,
   Amount * ExciseTaxRate
 ) as NetAmount
Resident Orders
;
// Drop the extra fields from Orders.
Drop Fields State, UnitCost, Discount, SalesTaxRate,ExciseTaxRate
From Orders
;

It’s a fairly good option.  It can be a lot of code depending on how many fields and tables we need to traverse. We need to be aware of “how many hops” between tables and may require intermediate joins (State field) to get to the final field (SalesTaxRate & ExciseTaxRate).

When using Join we need to ensure we have no duplicate keys that would mistakenly generate additional rows.

LOOKUP

Lookup() seems the most natural to me. It’s the least amount of code and it even sounds right: “look-up”.  It’s a one-to-one operation so there is no danger of generating extra rows.

It’s my least used option due to performance as we shall see.

Lookup takes four parameters  – a field to return, the field to test for a match, a match value to search for and the table to search.  Using Lookup() our script will look like this:

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * Lookup('UnitCost', 'ProductId', ProductId, 'Products'),
   Amount * Lookup('Discount', 'CustomerId', CustomerId, 'Customers'),
   Amount * Lookup('SalesTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States'),
   Amount * Lookup('ExciseTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States')
 ) as NetAmount
Resident Orders
;

Note that for SalesTaxRate and ExciseTaxRate, the third parameter — the match value — is another Lookup() to retrieve the State. This is how we handle  multiple hops, by nesting Lookup().

It’s a nice clean statement that follows a simple pattern.  It performs adequately with small volumes of data.

Lookup does have a significant performance trap in that it uses a scan  to find a matching value.  How long to find a value is therefore dependent on where in the field the value is matched.  If it’s the first value it’s very quick, the 1000th value much longer, the 2000th value exactly twice as long as the 1000th. It’s a bit crazy making that it executes in O(n) time, for which I prefer the notation U(gh).

APPLYMAP

I like to think of the ApplyMap() approach as an optimized form of Lookup().  We first build mapping tables for each field we want to reference and then use ApplyMap() instead of Lookup() in the final statement. Our script will look like this:

Map_ProductId_UnitCost:
Mapping
Load ProductId, UnitCost
Resident Products
;
Map_CustomerId_Discount:
Mapping
Load CustomerId, Discount
Resident Customers
;
Map_CustomerId_State:
Mapping 
Load CustomerId, State
Resident Customers
;
Map_State_SalesTaxRate:
Mapping 
Load State, SalesTaxRate
Resident States
;
Map_State_ExciseTaxRate:
Mapping 
Load State, ExciseTaxRate
Resident States
;
NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * ApplyMap('Map_ProductId_UnitCost', ProductId, 0),
   Amount * ApplyMap('Map_CustomerId_Discount', CustomerId, 0),
   Amount * ApplyMap('Map_State_SalesTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)),
   Amount * ApplyMap('Map_State_ExciseTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0))
 ) as NetAmount
Resident Orders
;

The mapping setup can be a lot of code depending on how many fields are involved. But it’s well structured and clean.

In the final statement, we are “looking up” the value using ApplyMap() and it performs very quickly.  ApplyMap uses a hashed lookup so it does not matter where in the list the value lies, all values perform equally.

We can re-structure and simplify the mapping setup and subsequent use with a subroutine like this:

Sub MapField(keyField, valueField, table)
// Create mapping table and set vValueField var // equal to ApplyMap() string.
 [Map_$(keyField)_$(valueField)]:
 Mapping Load [$(keyField)], [$(valueField)]
 Resident $(table);
 Set [v$(valueField)] = ApplyMap('Map_$(keyField)_$(valueField)', [$(keyField)]);
End Sub

Call MapField('ProductId', 'UnitCost', 'Products')
Call MapField('CustomerId', 'Discount', 'Customers')
Call MapField('CustomerId', 'State', 'Customers')
Call MapField('State', 'SalesTaxRate', 'States')
Call MapField('State', 'ExciseTaxRate', 'States')

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
 Quantity * $(vUnitCost),
 Amount * $(vDiscount),
 Amount * $(vSalesTaxRate),
 Amount * $(vExciseTaxRate)
 ) as NetAmount
;
LOAD
 *,
 $(vState) as State
Resident Orders
;

Note the use of the preceding load to handle the nested lookup of State.   You could also modify the Sub to handle some level of nesting as well.

I typically use the mapping approach as I find it always gives accurate results (with Join you must be careful of duplicate keys) and generally performs the best, and importantly, consistently.

Whether you are new to Qlik or an old hand I hope you found something useful in reading this far.

-Rob

 

 

Share

8 thoughts on “Creating Temporary Script Associations”

  1. Excellent Rob,
    I really appreciate for sharing such nice script and comparison!!
    Really clean and nice exaplanation.

  2. Thank you for sharing!
    It looks like a hack. 🙂

    … “local” and “global” variables… it is not about QLIK: all local is global and all global is local.))

    1. It depends. I didn’t comment much on performance specifics, except to say that Lookup() is always slow. Performance of ApplyMap() vs Join() in these type of scenarios will depend on your data.

  3. Really Helpful, I been using the join and Apply map, but Sub routine is better. Thanks for sharing useful.

Leave a Reply

Your email address will not be published.