Summary: I review using Join, Lookup() and ApplyMap() as script techniques to calculate using fields from multiple tables. I ultimately recommend ApplyMap().
Qlik charts can calculate values on the fly using fields from multiple tables in the model. The associative model takes care of navigating (joining) the correct fields together. Our expression syntax doesn’t even identify what table a field exists in — the associative logic takes care of this detail for us.
We may want to calculate a measure, for example, “Net Amount”, applying a business rule requiring fields from several tables:
Our expression to calculate “Net Amount” might look like this:
Sum(Amount) - RangeSum( Sum(Quantity * UnitCost), Sum(Amount * Discount), Sum(Amount * SalesTaxRate), Sum(Amount * ExciseTaxRate) )
There may be cases (such as performance) where we want to pre-calculate “Net Amount” as a new field in the script. In script, we don’t have the magic associative logic to assemble the fields. When a script expression is used to create a new field, all fields must be available in a single load statement. This is straightforward when all the required fields are in the same table. But what do we do when the fields come from multiple tables?
Here are three approaches to solving the problem of calculating a new field using multiple tables in script.
- Lookup() function
- ApplyMap() function
I’ll demonstrate deriving the same “Net Amount” calculation in the script.
The Join option will require us to execute multiple joins to assemble the fields onto each Orders row and then finally do the calculation. The script might look like this:
Left Join (Orders) LOAD ProductId, UnitCost Resident Products ; Left Join (Orders) LOAD CustomerId, Discount, State Resident Customers ; Left Join (Orders) LOAD State, SalesTaxRate, ExciseTaxRate Resident States ; NetAmount: LOAD OrderId, Amount - RangeSum( Quantity * UnitCost, Amount * Discount, Amount * SalesTaxRate, Amount * ExciseTaxRate ) as NetAmount Resident Orders ; // Drop the extra fields from Orders. Drop Fields State, UnitCost, Discount, SalesTaxRate,ExciseTaxRate From Orders ;
It’s a fairly good option. It can be a lot of code depending on how many fields and tables we need to traverse. We need to be aware of “how many hops” between tables and may require intermediate joins (State field) to get to the final field (SalesTaxRate & ExciseTaxRate).
When using Join we need to ensure we have no duplicate keys that would mistakenly generate additional rows.
Lookup() seems the most natural to me. It’s the least amount of code and it even sounds right: “look-up”. It’s a one-to-one operation so there is no danger of generating extra rows.
It’s my least used option due to performance as we shall see.
Lookup takes four parameters – a field to return, the field to test for a match, a match value to search for and the table to search. Using Lookup() our script will look like this:
NetAmount: LOAD OrderId, Amount - RangeSum( Quantity * Lookup('UnitCost', 'ProductId', ProductId, 'Products'), Amount * Lookup('Discount', 'CustomerId', CustomerId, 'Customers'), Amount * Lookup('SalesTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States'), Amount * Lookup('ExciseTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States') ) as NetAmount Resident Orders ;
Note that for SalesTaxRate and ExciseTaxRate, the third parameter — the match value — is another Lookup() to retrieve the State. This is how we handle multiple hops, by nesting Lookup().
It’s a nice clean statement that follows a simple pattern. It performs adequately with small volumes of data.
Lookup does have a significant performance trap in that it uses a scan to find a matching value. How long to find a value is therefore dependent on where in the field the value is matched. If it’s the first value it’s very quick, the 1000th value much longer, the 2000th value exactly twice as long as the 1000th. It’s a bit crazy making that it executes in O(n) time, for which I prefer the notation U(gh).
I like to think of the ApplyMap() approach as an optimized form of Lookup(). We first build mapping tables for each field we want to reference and then use ApplyMap() instead of Lookup() in the final statement. Our script will look like this:
Map_ProductId_UnitCost: Mapping Load ProductId, UnitCost Resident Products ; Map_CustomerId_Discount: Mapping Load CustomerId, Discount Resident Customers ; Map_CustomerId_State: Mapping Load CustomerId, State Resident Customers ; Map_State_SalesTaxRate: Mapping Load State, SalesTaxRate Resident States ; Map_State_ExciseTaxRate: Mapping Load State, ExciseTaxRate Resident States ; NetAmount: LOAD OrderId, Amount - RangeSum( Quantity * ApplyMap('Map_ProductId_UnitCost', ProductId, 0), Amount * ApplyMap('Map_CustomerId_Discount', CustomerId, 0), Amount * ApplyMap('Map_State_SalesTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)), Amount * ApplyMap('Map_State_ExciseTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)) ) as NetAmount Resident Orders ;
The mapping setup can be a lot of code depending on how many fields are involved. But it’s well structured and clean.
In the final statement, we are “looking up” the value using ApplyMap() and it performs very quickly. ApplyMap uses a hashed lookup so it does not matter where in the list the value lies, all values perform equally.
We can re-structure and simplify the mapping setup and subsequent use with a subroutine like this:
Sub MapField(keyField, valueField, table) // Create mapping table and set vValueField var // equal to ApplyMap() string. [Map_$(keyField)_$(valueField)]: Mapping Load [$(keyField)], [$(valueField)] Resident $(table); Set [v$(valueField)] = ApplyMap('Map_$(keyField)_$(valueField)', [$(keyField)]); End Sub Call MapField('ProductId', 'UnitCost', 'Products') Call MapField('CustomerId', 'Discount', 'Customers') Call MapField('CustomerId', 'State', 'Customers') Call MapField('State', 'SalesTaxRate', 'States') Call MapField('State', 'ExciseTaxRate', 'States') NetAmount: LOAD OrderId, Amount - RangeSum( Quantity * $(vUnitCost), Amount * $(vDiscount), Amount * $(vSalesTaxRate), Amount * $(vExciseTaxRate) ) as NetAmount ; LOAD *, $(vState) as State Resident Orders ;
Note the use of the preceding load to handle the nested lookup of State. You could also modify the Sub to handle some level of nesting as well.
I typically use the mapping approach as I find it always gives accurate results (with Join you must be careful of duplicate keys) and generally performs the best, and importantly, consistently.
Whether you are new to Qlik or an old hand I hope you found something useful in reading this far.