Category Archives: Scripting

How to Segment QVD Files

Summary: In this post I discuss when you may want to segment or “shard” QVD files and demonstrate some script patterns for producing and consuming segmented QVDs.

I recently received a question from a colleague who knew that I had done a lot of work with Qlik QVD files. He asked, “what’s the optimal size for a QVD file?”

My response was that in my experience there is no optimal physical size, but in many cases there are reasons to break a large QVD into multiple, smaller files. Dividing a large file into smaller files is called “segmenting” or “sharding”.

People generally start to think about segmenting QVDs when they encounter resource constraints while updating a QVD with an incremental load. In an incremental load, only new or changed rows are extracted from the database and merged with an existing QVD. This involves reading and writing the entire large QVD which can use significant time and I/O resources. This also means that the process takes increasingly longer as time marches on. Not pleasant.

Other reasons you may want to segment are consumption and archival patterns. It’s common to use the “latest n periods” of data in your active dashboard, for example the current 12 months. If the data is in a single large QVD, you have to roll off (delete) data older than 12 months. You can filter as you load, but that becomes an increasingly wasteful activity over time.

It’s likely that you will want to retain the older data in for use in special projects or historical dashboards.

Given the example scenario above, it would make sense to create one QVD for each month. This will provide predictable performance for incremental updates as well as the dashboard load. Older data could be kept forever and any set of months could be accessed efficiently.

How do we do perform this magic segmentation? Let’s assume an example QVD with these fields:

TransactionID: Primary key
TransactionDate: Original date of transaction
LastUpdate: Timestamp of last update to this row. Transactions may receive updates up to 60 days after creation.
other…: other fields such as Amount, Customer, etc

We want to create one QVD per month using the name “Transactions-YYYY-MM.qvd”. What determines which QVD a transaction is placed in? Is it the MonthStart(TransactionDate)? It depends…

The simplest technique is for the extract script to place everything loaded today into the current month QVD, regardless of the TransactionDate. The QVD name is assigned to a variable in the script using:

Let vQvdName = 'Transactions-' & Date(Today(1),'YYYY-MM') & '.qvd';

When later loading 12 QVDs into the dashboard, load front (most current QVD) to back with the clause:

Where not Exists(TransactionID)

The Where clause will ensure that only the most current row for that TransactionID will be loaded.

This simple technique might be ok for most scenarios. But it’s not very robust because it falls down when you want to do something like a full reload to add columns, or data is loaded off schedule. It also would require that if want to load something like 6 months from the middle, we have to be careful to include enough later QVDs to cover possible updates.

A more robust approach would be to store each transaction row in the QVD corresponding with it’s TransactionDate. Here is one script pattern to do just that. Our starting point for this script is that we have already extracted the new and changed rows to create table “Transactions”.

Step #1 is to collect the month values into a temp table:

TempMonths:
LOAD Distinct
MonthStart(TransactionDate) as TranMonth
Resident Transactions; 

Next we process each TranMonth value in a loop block. The block will build a temp table of rows for just one month and merge with any existing QVD.

For i = 1 to FieldValueCount('TranMonth')
Let vMonthName = Date(FieldValue('TranMonth', $(i)), 'YYYY-MM');
Set vQvdName = Transactions-$(vMonthName).qvd;


MonthTransactions:
NoConcatenate LOAD * Resident Transactions
Where MonthStart(TransactionDate) = FieldValue('TranMonth', $(i));


If FileSize('$(vQvdName)') > 0 THEN // If we have existing QVD
LOAD * From [$(vQvdName)] (qvd)
Where Not Exists(TransactionID);
ENDIF


Store MonthTransactions into [$(vQvdName)] (qvd);
Drop Table MonthTransactions;
Next i

Drop Table TempMonths, Transactions; 

The above segmenting script would support incremental reload, full reload or a load of any data in between.

So now we have many “Transactions-YYYY-MM.qvd” files. How do we load the current 12 months? Do we wake up early on the 1st of each month and quick change the script? No. We create a dynamic script based off the current day.

For i = 0 to -11 step -1  // 12 Months
Let vMonthName = Date(AddMonths(Today(1), $(i)), 'YYYY-MM');
Transactions:
LOAD *
From [Transactions-$(vMonthName).qvd] (qvd);
Next i 

If we had built the QVDs using any technique that allowed for the possibility of duplicate TransactionID, we would add a guard of “Where not Exists()”.

...
From [Transactions-$(vMonthName).qvd] (qvd)
Where not Exists(TransactionID); 

What About IntraDay High Volume Reloads?

In scenarios with Intraday loading and high transaction counts, I prefer to defer merging QVDs to off-peak times.

Let’s take an example scenario of a customer who generates approximately 10 million transactions per day, with peak hours creating about 2 million transactions. The main dashboard should be refreshed hourly for twelve hours each day and should contain the last 10 days of transactions. Of course all data should be kept around for various summary analyses and ad-hoc projects.

It makes sense to segment these QVDs by day. Our hourly incremental load will need to merge with — read and write — a fairly large daily QVD. Crucially, the load time gets longer as the day progresses and the QVD gets larger. And now I hear rumors of twice hourly reload. This pattern has a bad smell.

What to do? Let’s store the hourly incremental rows in a hourly QVD of their own. The dashboard will then pick up all hourly QVDs plus required daily QVDs. Overnight, when we have some breathing room, we will run a script to consolidate the hourlies into a daily QVD.

The hourly incremental QVD is created like:

Let vQvdName = 'Hourly-Transactions-' & Timestamp(Now(1), 'YYYY-MM-DD-hh-mm-ss') & '.qvd'; 
Store Transactions into [$(vQvdName)] (qvd); 

Then the dashboard will load the new data using a wildcard load for the Hourly QVDs and a loop for the prior days:

// Load all Hourly QVDs
Load * From [Hourly-Transactions-*.qvd] (qvd);
// Load previous 9 days of Daily QVDs
For i = 1 to 9 // 9 Days
Let vDateName = Date((Today(1) -$(i)), 'YYYY-MM-DD');
Transactions:
LOAD * From [Transactions-$(vDateName).qvd] (qvd);
Next i 

Getting LastUpdate From a QVD

One of the steps in incremental loading is determining what “zzz” value to use in the SQL “Where LastUpdate >= zzz”. We need the high value from the last load. Some people store this information in a side file or variable. I think the most reliable approach is to get the high value from the existing QVD.

Getting Max(LastUpdate) from a very large QVD can take some time (how to do this the quickest is always an interesting pub question). My preferred technique is to store a new field “MaxLastUpdate” in the QVD and then read only the first row of the QVD to retrieve the value.

Getting and Joining Max(LastUpdate) should be fairly quick because we are only dealing with the incremental rows.

Transactions:
SQL Select * From db.transactions where LastUpdate >= foo;
Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 

The lastest MaxLastUpdate value can then be retrieved by reading only the first row of the existing QVD. Here’s how it looks all together using the example of monthly QVDs.

Let vMonthName = Date(Today(1), 'YYYY-MM');
TempMonth:
First 1 Load MaxLastUpdate
From [Transactions-$(vMonthName).qvd] (qvd);
Let vMaxLastUpdate = TimeStamp(Peek('MaxLastUpdate'), 'MM-DD-YYYY hh:mm:ss');
Drop Table TempMonth;


Transactions:
SQL Select * From db.transactiions
where LastUpdate >= '$(vMaxLastUpdate)'; 


Left Join (Transactions)
Load Max(LastUpdate) as MaxLastUpdate
Resident Transactions; 


// Merge or segment with existing QVDs

I hope you found some useful tips in this article. No doubt you have some ideas of your own, feel free to add comments.

Want to learn more advanced scripting techniques? After 2 years of virtual sessions, the Masters Summit for Qlik is back with live events this fall. In September we’ll be in Madrid, Spain, and in November we’ll be in New Orleans, USA. If you want to take your Qlik skills to the next level, get access to all sorts of ready-to-use solutions and reference materials, share experiences and maybe a few horror stories with your peers then we hope to see you there!

Share

Can I Update a Mapping Table?

Summary: Can you concatenate rows to a Mapping Table? The answer is “Yes” and I show you how.

Recently my Masters Summit colleague Oleg Troyanksy asked me “What are the limits on updating an existing Mapping Table in Qlik script?”.  My immediate answer was “You can’t update a Mapping Table after it’s been created”.  Then Oleg showed me a script that surprised me — adding to a mapping table in a loop,  implicit concatenation.  The loop works, but what if we need to load from multiple sources?

When the contents of a mapping table come from multiple Load statements, I have always advised doing standard loads to a temp table and then a final Mapping Load from the temp table.

Turns out you can concatenate to a mapping table.  Maybe that temp table is unnecessary.  I don’t find any doc on the topic. Here’s what I’ve found from experimentation.

We may wish to do something like this:

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Concatenate (MapX)
Mapping Load * Inline [
 from, to
 3, C
 ];

That script will fail with  “Table ‘Mapx’ not found” error. You cannot directly reference a mapping table this way.

Interestingly, if we leave off the “Concatenate (MapX)”, it will concatenate and result in the desired mapping.  Implicit concatenation will kick-in and the second load will add rows to the mapping table.  I’ve included some ApplyMap() code in this example so you can copy/paste and test for yourself.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Mapping Load * Inline [
 from, to
 3, C
 ];

Data:
 Load RecNo() as Recid,
 ApplyMap('MapX', RecNo()) as Mapped
 AutoGenerate 5;

The resulting output looks like this, proving that 3/C has been added to the map.

Unlike implicit concatenation for standard tables, the fieldnames  need not be the same. This script will concatenate. Note the different fieldnames in the second load.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Mapping Load * Inline [
 X, Y
 3, C
 ];

What if there is an intervening standard load?  Will concatenation occur?

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];
 
Fact:
 LOAD 1 as X AutoGenerate 1; 

Mapping Load * Inline [
 X, Y
 3, C
 ];

The answer is no, concatenation will not happen. The second Mapping Load will create a new invisible mapping table.

So if I  can’t name the mapping table in a Concatenate prefix, is there some other way to explicitly concatenate?  Turns out there is. This will work.

MapX:
 Mapping Load * Inline [
 from, to
 1, A
 2, B
 ];

Fact:
 LOAD 1 as X AutoGenerate 1;

MapX:
 Mapping Load * Inline [
 X, Y
 3, C
 ];

When naming the mapping table again with the same label. explicit concatenation will occur!  This is unlike a standard load where repeating a label results in  a new table with “-1” suffix  (when fieldnames don’t match).

In summary, you can add rows to a mapping table. Repeating the table label again will ensure you are adding,  whether there is an intervening standard load or not.

Now you may be wondering …can I do an implied Join? I think not.

On Jan 20, 2021 I’ll be sharing more scripting tips and techniques in my “Advanced Scripting” on-line course as part of the “Masters Summit at Home” series.   Join me there for more on topics like creating reusable script and maintaining optimized loads.

Share

Creating Temporary Script Associations

Summary: I review using Join, Lookup() and ApplyMap() as script techniques  to calculate using fields from multiple tables. I ultimately recommend ApplyMap().

Qlik charts  can calculate values on the fly, using fields from multiple tables in the model.  The associative model takes care of navigating (joining) the correct fields together.  Our expression syntax doesn’t  identify what table a field exists in — the associative logic takes care of this detail for us.

We may want to calculate a measure, for example, “Net Amount”, applying a business rule requiring fields from several tables:

Our expression to calculate “Net Amount” might look like this:

Sum(Amount) - 
RangeSum(
  Sum(Quantity * UnitCost), 
  Sum(Amount * Discount), 
  Sum(Amount * SalesTaxRate), 
  Sum(Amount * ExciseTaxRate)
)

There may be cases (such as performance) where we want to pre-calculate “Net Amount” as a new field in the script.  In script, we don’t have the magic associative logic to assemble the fields.  When a script expression is used to create a new field, all fields must be available  in a single load statement.  This is straightforward when all the required fields are in the same table.  But what do we do when the fields come from multiple tables?

Here are three approaches to solving the problem of calculating a new field using multiple tables in script.

  • Join
  • Lookup() function
  • ApplyMap() function

I’ll demonstrate deriving the same “Net Amount” calculation in the script.

JOIN

The Join option will require us to execute multiple joins to assemble the fields onto each Orders row and then finally do the calculation.  The script might look like this:

Left Join (Orders)
LOAD
 ProductId,
 UnitCost
Resident Products
; 
Left Join (Orders)
LOAD
 CustomerId,
 Discount,
 State
Resident Customers
; 
Left Join (Orders)
LOAD
 State,
 SalesTaxRate,
 ExciseTaxRate
Resident States
;

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * UnitCost,
   Amount * Discount,
   Amount * SalesTaxRate,
   Amount * ExciseTaxRate
 ) as NetAmount
Resident Orders
;
// Drop the extra fields from Orders.
Drop Fields State, UnitCost, Discount, SalesTaxRate,ExciseTaxRate
From Orders
;

It’s a fairly good option.  It can be a lot of code depending on how many fields and tables we need to traverse. We need to be aware of “how many hops” between tables and may require intermediate joins (State field) to get to the final field (SalesTaxRate & ExciseTaxRate).

When using Join we need to ensure we have no duplicate keys that would mistakenly generate additional rows.

LOOKUP

Lookup() seems the most natural to me. It’s the least amount of code and it even sounds right: “look-up”.  It’s a one-to-one operation so there is no danger of generating extra rows.

It’s my least used option due to performance as we shall see.

Lookup takes four parameters  – a field to return, the field to test for a match, a match value to search for and the table to search.  Using Lookup() our script will look like this:

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * Lookup('UnitCost', 'ProductId', ProductId, 'Products'),
   Amount * Lookup('Discount', 'CustomerId', CustomerId, 'Customers'),
   Amount * Lookup('SalesTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States'),
   Amount * Lookup('ExciseTaxRate', 'State', Lookup('State', 'CustomerId', CustomerId, 'Customers'), 'States')
 ) as NetAmount
Resident Orders
;

Note that for SalesTaxRate and ExciseTaxRate, the third parameter — the match value — is another Lookup() to retrieve the State. This is how we handle  multiple hops, by nesting Lookup().

It’s a nice clean statement that follows a simple pattern.  It performs adequately with small volumes of data.

Lookup does have a significant performance trap in that it uses a scan  to find a matching value.  How long to find a value is therefore dependent on where in the field the value is matched.  If it’s the first value it’s very quick, the 1000th value much longer, the 2000th value exactly twice as long as the 1000th. It’s a bit crazy making that it executes in O(n) time, for which I prefer the notation U(gh).

APPLYMAP

I like to think of the ApplyMap() approach as an optimized form of Lookup().  We first build mapping tables for each field we want to reference and then use ApplyMap() instead of Lookup() in the final statement. Our script will look like this:

Map_ProductId_UnitCost:
Mapping
Load ProductId, UnitCost
Resident Products
;
Map_CustomerId_Discount:
Mapping
Load CustomerId, Discount
Resident Customers
;
Map_CustomerId_State:
Mapping 
Load CustomerId, State
Resident Customers
;
Map_State_SalesTaxRate:
Mapping 
Load State, SalesTaxRate
Resident States
;
Map_State_ExciseTaxRate:
Mapping 
Load State, ExciseTaxRate
Resident States
;
NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
   Quantity * ApplyMap('Map_ProductId_UnitCost', ProductId, 0),
   Amount * ApplyMap('Map_CustomerId_Discount', CustomerId, 0),
   Amount * ApplyMap('Map_State_SalesTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0)),
   Amount * ApplyMap('Map_State_ExciseTaxRate', ApplyMap('Map_CustomerId_State', CustomerId, 0))
 ) as NetAmount
Resident Orders
;

The mapping setup can be a lot of code depending on how many fields are involved. But it’s well structured and clean.

In the final statement, we are “looking up” the value using ApplyMap() and it performs very quickly.  ApplyMap uses a hashed lookup so it does not matter where in the list the value lies, all values perform equally.

We can re-structure and simplify the mapping setup and subsequent use with a subroutine like this:

Sub MapField(keyField, valueField, table)
// Create mapping table and set vValueField var // equal to ApplyMap() string.
 [Map_$(keyField)_$(valueField)]:
 Mapping Load [$(keyField)], [$(valueField)]
 Resident $(table);
 Set [v$(valueField)] = ApplyMap('Map_$(keyField)_$(valueField)', [$(keyField)]);
End Sub

Call MapField('ProductId', 'UnitCost', 'Products')
Call MapField('CustomerId', 'Discount', 'Customers')
Call MapField('CustomerId', 'State', 'Customers')
Call MapField('State', 'SalesTaxRate', 'States')
Call MapField('State', 'ExciseTaxRate', 'States')

NetAmount:
LOAD
 OrderId,
 Amount - RangeSum(
 Quantity * $(vUnitCost),
 Amount * $(vDiscount),
 Amount * $(vSalesTaxRate),
 Amount * $(vExciseTaxRate)
 ) as NetAmount
;
LOAD
 *,
 $(vState) as State
Resident Orders
;

Note the use of the preceding load to handle the nested lookup of State.   You could also modify the Sub to handle some level of nesting as well.

I typically use the mapping approach as I find it always gives accurate results (with Join you must be careful of duplicate keys) and generally performs the best, and importantly, consistently.

Whether you are new to Qlik or an old hand I hope you found something useful in reading this far.

-Rob

 

 

Share

Num() — Script vs Chart

Summary: I review a subtle difference between using Num() in the script vs Num() in charts. I make mistakes so you don’t have to 🙂 

2020 marks my 13th year blogging about Qlik.  I’m still learning, and still making mistakes!  I’ll review a problem I created for myself recently, with the hope that you won’t have to repeat the exercise.

Here is a simplified example of the problem, enough to demonstrate the issue:  This is the starting data table.

We have some number of “Metric” and  “Value” along with fields that describe how the data should be formatted for display.  The “Format” field contains a Qlik format specification.  Format may be used in a Num() function as shown in the “Num(Value, Format)” measure in this table. Output is as expected,  so far so good.

In my  project, it became required to format the Value in the script. I moved the Num() measure to the script like this:

Num(Value, Format) as ValueFormatted

I expected ValueFormatted to yield the same result as the chart measure. Adding “ValueFormatted” to the chart yields this:

The results are not the same.  ValueFormatted for “Days to ship” is incorrect. The other values are correct.

Let’s introduce another variation. This time I’ll sort the input when I create ValueFormatted.

Left Join(Data) 
Load 
    *, 
    Num(Value, Format) as ValueFormatted 
Resident Data 
Order by Metric Desc ;

Now “Customers per location” is incorrect and the other values are correct! What gives?

The Num() function returns a Dual() value . Duals have both a string (display) and a numeric value.  When populating a data model field, Qlik will use a single string representation for a given numeric value for that field.  The string selected will be the first encountered for that numeric value.

“Customers per location” and “Days to ship” shared the numeric value 4.  In the data model, one or the other string representation — “4” or “4 days” — will be used, depending on which one is created first.

To get the correct results in this scenario — that is, unique strings dependent on Format — add the Text() function to extract the string at runtime.

Text(Num(Value, Format)) as ValueFormatted

Resolution came of course after I took the advice of my colleague Barry  to “take a walk”.

I hope my story might save you some trouble. Happy scripting!

-Rob

Share

Chart Search in QlikView?

Summary: I demonstrate how to self-collect qvw metadata in a load script and use the metadata to implement a chart search feature in my qvw.

Qlik Sense has that cool chart search feature. Can we have the same in QlikView? Something where we can search for a keyword like “price” and see all the charts that have “price” in the title and on selection go directly to that chart?  Maybe searching chart Dimensions and Expressions as well?

In this downloadable example qvw, I’ve included script on the last tab to read xml metadata from the current qvw and build a table of chart titles linked to associated sheets.  When a title is selected, an Action (Document Properties, Triggers) assigned to the field  will go to the associated sheet.

Download the example and check out the script.  In the script you’ll notice some configuration options to include Dimensions and Expressions in the search.  They are set off in the example but feel free to play.

You’ll also notice in the script some code for mapping  container objects to sheets. Unfortunately, the xml metadata does not contain this mapping so it has to been added if you want it.  Objects outside containers can be mapped automatically.

For me this was a “just for fun” exercise, no one asked for it (although I thought I saw something on Qlik Community…).  Let me know if you make a useful implementation of it or if you improve the process.

-Rob

 

 

Share

Parsing Non-Standard Signs

Summary: I demonstrate using Num#() and Alt() functions to read numbers with non-standard signs.  Download link at bottom of post.

When reading from text files, the default Qlik interpretation of numeric sign syntax is as follows:

100:  No prefix, positive number
+65: “+” prefix, positive number
-110: “-” prefix, negative number

In the default interpretation a “-” suffix or “()” are not recognized as valid numbers and are loaded as text values.

120-
(200)

Sign indicators like “CREDIT” or “DEBIT” are by default unknown to Qlik and the value will be loaded as text.

300 CREDIT
400 DEBIT

In a Table Box, Chart Dimension or Listbox, numeric values are by default right aligned and text values are left aligned by default. This is a simple way to check what is text and what is numeric.

Aggregation functions, such as Sum(), treat text values as zero.  So a chart using the example numbers above would look like this:

 

 

 

 

 

I can utilize the num#() script function to tell Qlik how to read numbers using other than  default signs. For example, to indicate that a trailing minus is used:

Num#(Sample, '0;0-') as Amount2

That takes care of “120-“.  But what about the other odd signs?  I can nest multiple num#() functions inside Alt() to test various patterns:

 Alt(
   Num#(Sample, '0;0-')
   ,Num#(Sample, '0;(0)')
   ,Num#(Sample, '0 CREDIT;0 DEBIT')
 ) as Amount3

The chart demonstrates that all values are correctly recognized as numbers.  They do retain their input values as the display format.

 

 

 

 

 

If I want to harmonize the display formats, I can add an outer Num() function to indicate the display format for all.

 Num(
   Alt(
     Num#(Sample, '0;0-')
     ,Num#(Sample, '0;(0)')
     ,Num#(Sample, '0 CREDIT;0 DEBIT')
   )
 ,'#,##0.00;(#,##0.00)') as Amount4

Downloadable QV & QS examples to accompany this post can be found here.

-Rob

Share

qcb-qlik-sse, A General Purpose SSE

In Qlikview we have the ability to add function to the scripting language by writing VbScript in the document module (sometime called the “macro module”).  Typical  additions included regular expression matching & parsing,

Qlik Sense does not have the module feature, but both Sense and QlikView share a similar feature,  Server Side Extension (SSE).  SSE is typically positioned as a method to leverage an external calculation engine such as R or Python  from within Qlik script or charts.   The Qlik OSS team has produced a number of SSE examples in various languages.

SSE seems to be a good fit for building the “extra” functions (such as regex) that I am missing in Sense.  The same SSE can serve both Sense and QlikView.

Installing and managing a SSE takes some effort  so I’m  clear I don’t want to create a new SSE for every new small function addition.  What I want is a general purpose SSE where I can easily add new function, similar to the way QlikView Components does for scripting.

Miralem Drek has created a package, qlik-sse,  that makes for easy work of implementing an SSE using nodejs.  What I’ve done is use qlik-sse to create qcb-qlik-sse,  a general purpose SSE that allows functions to be written in javascript and added in a “plugin” fashion.

My motivating  principles for qcb-qlik-sse:

  • Customers set up the infrastructure — Qlik config & SSE task — once.
  • Allow function authors to focus on creating function and not SSE details.
  • Leverage community through a shared function repository.

I’ve implemented a number of functions already.  You can see the current list here. Most of the functions thus far are string functions like RegexTest and HtmlExtract that I frequently have implemented in  QlikView module or I’ve missed from other languages.

One of the more interesting functions I’ve implemented is CreateMeasure(), which allows you to create Master Measures from load script.  This is a problem I’ve been thinking about for some time and qcb-qlik-sse seemed to be a natural place to implement.

If you want to give qcb-qlik-sse a try, download or clone the project. Nodejs 8+ is required.  Some people report problems trying to install grpc using node 12, so if you are new to all this I recommend you install nodejs v10 instead of the latest v12.

If you are familiar with github and npm, you will hopefully find enough information in the readme(s) to get going. If not, here’s a quickstart.

  1. Install nodejs if not already present.  To check the version of nodejs on your machine, type at a command prompt:
    node --version
  2. Download and extract qcb-qlik-sse on the same machine as your Qlik desktop or server.
  3. From a command prompt in the qcb-qlik-sse-master directory install the dependent packages:
    npm install
  4. Configure the SSE plugin in Qlik.  Recommend prefix is QCB. If configuring in QlikView or Qlik Sense Desktop the ini statement will be:
     SSEPlugin=QCB,localhost:50051
  5. Start the SSE:
     ./runserver.cmd

The “apps” folder in the distribution contains a sample qvf/qvw that exercises the functions.

I’d love to get your feedback and suggestions on usage or installation.

-Rob

 

Share

A Common SSE Plugin Project

Summary: I introduce qcb-qlik-sse, a community Server Side Extension to share custom Qlik functions. 

At the Masters Summit for Qlik, I dive into several different methods of creating reusable script and custom functions.  In QlikView we have the ability to write custom functions using VbScript/Jscript in the qvw Module.

Custom functions have been useful for things like regular expressions, geo calculations, url encoding, encryption and others.  I’ll call them “edge functions” — some of us need them, some of us don’t.

Qlik Sense does not have the module facility. How can we satisfy the requirement for custom function in Qlik Sense?  The Server Side Extension (SSE) facility can fill the need and is available to both Qlik Sense and QlikView.

An SSE Plugin runs as a separate task and provides communication with a Qlik Script or chart Expression via a TCP port. The same SSE Plugin can serve both QS and QV.

Anyone can write an SSE. The Qlik team provides the SSE base and you write a plugin that wires your new functions to Qlik. The new functions can be used in both Script and Charts.  A number of plugins have already been produced.

SSE seems to be the ideal place to provide a collection of edge functions.  Rather than a bunch of one-offs, I’m thinking a good idea would be to pool resources into a single effort that could be shared, much like QlikView Components did for Script.

I’ve implemented this idea as qcb-qlik-sse.  This server uses as it’s base the qlik-sse package created by Miralem Drek.

qcb-qlik-sse is written in javascript and runs in node.js.  At startup, the server scans it’s “/functions” directory and discovers what functions are available.  The general idea is that you can add new function by creating a new js file.  You can remove function you don’t want available by deleting the corresponding js file.

See what functions I’ve already implemented in the doc here. I’ve also provided an example qvf and qvw that exercise the functions.

If you want to try it out,  download the project and define the plugin to QS or QV as documented here.  You will also need node.js 8 or later installed.

Defined functions will show up in the suggestion list in both the script and expression editors.

 

If you want to add functions, some javascript skills are required.  Follow the directions in the readme and submit a PR.

I’ve labeled the project as “experimental” at this stage because I anticipate there could be some significant restructuring as I get feedback.

Let me know your ideas and if you find this useful!

-Rob

Want to kick the tires on reusable code and make your Qlik team more efficient? Come to the Masters Summit for Qlik, a three day advanced training event for Qlik Developers. 

 

 

Share

Script Interpretation and Evaluation

Summary: Qlik Script is both interpreted and evaluated. Understanding the meaning of these terms is useful in writing advanced script and understanding why some script “tricks” work and others don’t.

In my Advanced Scripting session at the Masters Summit for Qlik, I sometimes begin with the question “What does $() do in script?”.  A common answer I hear is “it causes the variable contents to be treated as an expression and evaluated”.  While in some cases this is the practical result, the answer is incorrect.

$() “Dollar Sign Expansion or DSE”, which takes place during interpretation, replaces the variable name with the value of the variable.  This process is called “expansion” or “substitution”. The new value may subsequently get evaluated as an expression during the evaluation phase, but only if an expression is expected in that specific script statement.  Let’s take a look.

SET vVal = 1 + 1; // expecting "1 + 1"
LET v1 = $(vVal); // expecting "2"
SET v2 = $(vVal); // expecting "1 + 1"

$(vVal) will substitute the contents of the vVal variable which is the string “1 + 1”.   After execution, we expect v1 to be “2” because the behavior of a LET statement is to evaluate what is to the right of the equal sign as an expression.  We expect v2 will be “1 + 1” because the documented behavior of the SET statement is to treat the value to the right of equals as a literal string, not an expression.

The individual pieces of a script statement are carved up by the script parser into “symbols” or tokens. The script language definition defines what types — Expressions, Literals, Numbers, etc. — are valid for each symbol.  Only symbols defined as Expressions will be evaluated as such.

The TRACE statement expects a String as it’s symbol.  We can’t calculate a dynamic value within the TRACE statement itself.  We must calculate the value into a variable and then reference the variable like this:

LET vRows = NoOfRows('Orders');
Trace Orders has $(vRows) rows;

It would be useful if script supported the “$(=)” DSE form, like Set Analysis. Then we could do something like this.

Trace Orders has $(=NoOfRows('Orders')) rows;

Simple enough so far.  Let’s look at some more subtle examples.  A user on Qlik Community posted  a requirement to dynamically form several fieldnames in a load statement based off the current date.  She wanted to load a specific field as ‘Month_MMM’ where MMM is the current month-1, another field as current month-2 and so on.  She created a reusable (good idea) variable-with-parameter to create the names:

SET vMonthFieldname = 'Month_' & Date(AddMonths(today(),$1),'MMM');

If called in July as $(vMonthFieldname(-1)) the expected return value would be “Month_Jun”.  The LOAD statement would look something like:

LOAD
  Key,
  field1 as [$(vMonthField(-1))],
  field2 as [$(vMonthField(-2))]

This script executed without error, but to her surprise the final table had unexpected field names:

The variable expansion took place during interpretation, returning the expression string that was intended to form the fieldname.  However, the “as aliasname” clause only expects a literal so no evaluation took place and the string was used as-is for the fieldname.  Like the TRACE example, the workaround would be to first build the values  using LET:

LET vMonth1=$(vMonthField(-1));
LET vMonth2=$(vMonthField(-2));
Data:
LOAD
  Key,
  field1 as [$(vMonth1)],
  field2 as [$(vMonth2)]

How about if we used vMonthField  as the fieldref parameter (to the left of the “as” keyword)?

LOAD
  Key,
  $(vMonthField(-1)) as Month1,
  $(vMonthField(-2)) as Month2

The symbol to the left of “as” may be an Expression, so this would generate expected values. (Admittedly, this is not what the poster asked for, I include it only for illustration).

Let’s visit another illustration of interpretation and evaluation.  How many times will this loop execute?

SET vCounter = 1;
Do While $(vCounter) <= 3
  TRACE Counter is $(vCounter);
  LET vCounter = $(vCounter)+1;
Loop

If your answer is “forever”, you are correct.  Looking at the progress window, we can see  the counter increments beyond 3 but the script continues running.

The script execution log is where we can see what script looks like after interpretation and DSE.

2019-07-25 00:48:17 0021 Do While 1 <= 3
2019-07-25 00:48:17 0022 TRACE Counter is 1
2019-07-25 00:48:17 0022 Counter is 1
2019-07-25 00:48:17 0023 LET vCounter = 1+1
2019-07-25 00:48:18 0025 Loop
2019-07-25 00:48:18 0022 TRACE Counter is 2
2019-07-25 00:48:18 0022 Counter is 2
2019-07-25 00:48:18 0023 LET vCounter = 2+1
2019-07-25 00:48:19 0025 Loop
2019-07-25 00:48:19 0022 TRACE Counter is 3
2019-07-25 00:48:19 0022 Counter is 3
2019-07-25 00:48:19 0023 LET vCounter = 3+1

The Do While condition is fixed at “1 <= 3” which will always be true!  According to the documentation for Do While:

The while or until conditional clause must only appear once in any do..loop statement, i.e. either after do or after loop. Each condition is interpreted only the first time it is encountered but is evaluated for every time it is encountered in the loop.

The doc tells us our $() expansion — which is interpretation — will happen only once.  Evaluation, on the other hand, will be done repeatedly.  The correction to the loop will be to remove $() so our statement looks like this:

Do While vCounter <= 3

This is a valid expression and the loop will execute only 3 times.

So where do we need $() in this loop? We need it in the TRACE because we do not have evaluation, only interpretation.  The While condition and the LET symbol both allow Expressions, therefore we can count on evaluation to get the current variable value.

Do While vCounter <= 3
  TRACE Counter is $(vCounter);
  LET vCounter = vCounter + 1;
Loop

I hope this discussion and examples help you to understand script  interpretation and evaluation, especially in the context of $().

Share

Loading Varying Column Names

Summary:  A script pattern to wildcard load from multiple files when the column names vary and you want to harmonize the final fieldnames.  Download example file here.

I’m sometimes wondering “what’s the use case for the script ALIAS statement?”.  Here’s a useful example.

Imagine you have a number of text files to load; for example extract files from different regions.  The files are similar but have slight differences in field name spelling.   For example the US-English files use “Address” for a field name, the German file “Adresse” to represent the same field and the Spanish file “Dirección”.

We want to harmonize these different spellings so we have a single field in our final loaded table.  While we could code up individual load statements with “as xxx” clause to handle the rename, that approach could be difficult to maintain with many variations.  Ideally we want to load all files in a single load statement and describe any differences in a clear structure.  That’s where ALIAS is useful.  Before we load the files, use a set of ALIAS statements only for the fields we need to rename.

ALIAS Adresse as Address;
ALIAS Dirección as Address;
ALIAS Estado as Status;

The ALIAS will apply the equivalent “as” clause to those fields if found in a Load.

We can now load the files using wildcard “*” for both the fieldlist and the filename:

Clients:
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;

It’s magic I tell you!

What if the files have some extra fields picked up by “LOAD *” that we don’t want?  It’s also possible that the files have different numbers of fields in which case automatic concatenation won’t work.  We would get some number of “Client-n” tables which is incorrect.

First we will add the Concatenate keyword to force all files to be loaded into a single table.   As the table doesn’t exist, the script will error with “table not found” unless we are clever.  Here is my workaround for that problem.

Clients:
LOAD 0 as DummyField AutoGenerate 0;
Concatenate (Clients)
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;
DROP Field DummyField;

Now let’s get rid of those extra fields we don’t want.  First build a mapping list of the fields we want to keep.

MapFieldsToKeep:
 Mapping
 LOAD *, 1 Inline [
 Fieldname
 Address
 Status
 Client
 ]
 ;

I’ll use a loop to DROP fields that are not in our “keep list”.

For idx = NoOfFields('Clients') to 1 step -1
  let vFieldName = FieldName($(idx), 'Clients');
  if not ApplyMap('MapFieldsToKeep', '$(vFieldName)',   0) THEN
    Drop Field [$(vFieldName)] From Clients;
EndIf
Next idx

The final “Clients” table contains only the fields we want, with consistent fieldnames.

Working examples of this code for both  Qlik Sense and QlikView  can be downloaded here.

I hope you find some useful tips in this post. Happy scripting.

-Rob

 

 

Share