Category Archives: Performance

Better Calendar Scripts

TLDR: A new Qlik Sense training video uses a tired old-school Master Calendar script. I propose that updated training materials should use an updated script.

I just watched  a new  video by Michael Tarallo of Qlik titled “Understanding the Master Calendar – Qlik Sense and QlikView“.  Before I  pick on the contents of this particular video, I want to  credit  Michael for  producing many excellent training videos that are worth watching and learning from. I  highly recommend them.

The video does a great job of explaining the need for and function of a Master Calendar in your data model. It then goes on to show an actual script.

I can’t discuss Master Calendar without expressing disappointment that Calendar generation is not yet a builtin function in Sense. Something like QlikView Components (QVC) does with the single script line:

CALL Qvc.CalendarFromField('OrderDate');

On to the script used in this new video. I’ll reproduce the entire script below and then comment on the techniques used and suggest some more “modern” approaches.

The video script is  similar to the script used in the current QlikView Developer course . I acknowledge that this script works as is and produces correct results. But I don’t think it should be taught to newbies as good scripting. Here’s the script from the video:

Obsolete Code

1. Why is this field created  and where is it used?

It’s not used. It’s left over from a very old version of the exercise and it doesn’t serve any purpose.

2. Why are we sorting the table? Is this statement useful?

Even if I could think of a good reason why  the Calendar table should be in order, it already is in this order because the TempCalendar was generated in a loop. Statement unnecessary.

Inefficient Code

Loading a Resident table can be very slow for a large table.

 

Experienced scripters use the FieldValues array instead.

What’s the difference? FieldValues only reads the distinct values of a field — maybe a thousand or so for several years of dates. Resident reads every row of the table. For a 10M row fact table, that’s 10M reads and the time scales up linearly. The difference can be dramatic.

Error Prone and Extra Work

Peek(), used on lines 13 & 14,  is one of those functions that fails silently. That is, if you misspell a field or table,  you won’t get a script error. Misspelling a variable will also not generate  a script error. Maybe. Or maybe not. Or maybe you will get a Calendar that starts in year 1899.  Depends on which line you make the spelling error on. If your misspelling does result in a script syntax error, it will be downstream from where you created the problem. There are multiple ways to mess this one up and some very curious potential results.

Don’t forget to DROP those temp tables on lines 15 and 35.

And those varMinDate and varMaxDate variables really should be cleared as well.

You can avoid all the syntax traps and extra cleanup by coding this a as a Preceding Load.  Here’s the same script written as a Preceding Load:

Nothing to remember (or forget) to clean up.  If you misspell a fieldname, you will get an understandable error at the right place. This is the calendar script I wish we would provide to newcomers.

Of course if you’ve attended the Masters Summit for QlikView, you’ve learned all about FieldValues, Preceding Loads and more. If not, attend a summit this Fall in NYC or Copenhagen.

QVC users don’t even get out of bed to generate Calendars. But they know that all that best practice stuff is happening under the covers. If you want to learn more about QVC, join my online class June 4 or a future class at Q-on.bi.

-Rob

Entire script suitable for copying:

MasterCalendar: 
Load 
 TempDate AS OrderDate, 
 week(TempDate) As Week, 
 Year(TempDate) As Year, 
 Month(TempDate) As Month, 
 Day(TempDate) As Day, 
 'Q' & ceil(month(TempDate) / 3) AS Quarter, 
 Week(weekstart(TempDate)) & '-' & WeekYear(TempDate) as WeekYear, 
 WeekDay(TempDate) as WeekDay 
;

//=== Generate a temp table of dates === 
LOAD 
 date(mindate + IterNo()) AS TempDate
 ,maxdate // Used in InYearToDate() above, but not kept 
WHILE mindate + IterNo() <= maxdate;

//=== Get min/max dates from Field ===/
LOAD
 min(FieldValue('OrderDate', recno()))-1 as mindate,
 max(FieldValue('OrderDate', recno())) as maxdate
AUTOGENERATE FieldValueCount('OrderDate');

Share

Document Analyzer Numeric Field Recommendations

In the latest release 2.4  of QV Document Analyzer I’ve started the process of adding “Recommendations”, highlighting specific areas where potential improvement may be made to your document.

The recommendation included in this release is “Numeric field sizes”. Numeric fields that occupy more than 8 bytes of storage per value will be flagged. They will be highlighted on the Fields sheet and additional details will be provided on the Recommendation sheet.  The potential savings from “fixing” the field is also shown.

So what is this 8 byte thing? Ideally, most numeric fields (which includes timestamps) can be represented by 8 bytes of storage. In the Number format property or the script num() function you assign a formatting mask to be used for string representation.  QV stores the 8 byte number and applies the mask at display time.

Occasionally, particularly when doing an un-optimized load from QVD, the data can wind up being saved as “Mixed” which means both the numeric and string representations are stored for each value. So you may see Symbol widths of 17, 20, 24 or even greater for this field.

In larger apps with many unique numeric values, I’ve found significant memory savings by restoring these Mixed fields to a numeric format.

The usual tuning and optimizing caveat applies here. Don’t spend effort recovering a few megabytes. You have better things to do. But if you are wrestling with the megabeast app, fixing the biggest of those fields may be worth the effort.

-Rob

Share

How to Choose an Expression

This is a follow on to my post “How Not to Choose an Expression” that described the performance problem sometimes experienced in large apps when choosing one of several expressions.

I received a number of questions about the scalability of my suggested solution to define multiple Expressions using the Expression Conditional property. In this post I’ll present an alternative for when you have a large number of option combinations.

Before I dive in,  an important comment about coding for performance. First, you should code for clarity and maintainability. If your document contains only a few million rows, it probably won’t matter if you use if() or an alternative technique.  I’m fond of the  Donald Knuth quote “Premature optimization is the root of all evil (or at least most of it) in programming”. The techniques presented in this post are meant to solve an identifiable performance problem. I wouldn’t bother implementing them unless I have a need.

Let’s use a scenario where calculations should reflect user selected options.

  • US Dollars or Local Currency  — USD | LC
  • Include VAT?  — Y|N

I’m only dealing with two options to keep my example manageable. You should be able to extend the concepts into many options.

The if method chart Expression for our choice may look like this:

if(CurrencyType='USD' AND [Include VAT?]='N'
 ,Sum(SalesAmount_USD)
,if(CurrencyType='USD' AND [Include VAT?]='Y'
 ,Sum(SalesAmount_USD + VAT_USD)
,if(CurrencyType='LC' AND [Include VAT?]='N'
 ,Sum(SalesAmount_LC)
,if(CurrencyType='LC' AND [Include VAT?]='Y'
 ,Sum(SalesAmount_LC + VAT_LC)
))))

The [CurrencyType] field controls which field we will sum() and the [Include VAT?] field controls if the associated VAT field is included in the sum(). What’s the difference between the alternatives? Only the fields listed in the sum().

Our Document contains 100M rows and we’ve identified this expression as a bottleneck. What are some alternatives to make this perform better?

In my previous post, I discussed dividing this into four expressions and making a choice using the Expression Conditional property. I won’t repeat the details of it here.  In this case, I don’t want to create multiple expressions in the charts. What is another possible technique?

Start by loading a table that reflects the choice fields and the associated sum() fields.

This is an island table that is not linked to any other tables in our model.

Create Listboxes for [CurrencyType] and [Include VAT?]. Set the “Always One Selected” property in the Listboxes.  This will force the user to make selections and at any given time only one row will be possible in our table.

We will then reference the SalesExprField in our chart using Dollar Sign Expansion (DSE). DSE is performed before the expression is evaluated. Our chart expression is now:

Sum($(=SalesExprField))

The “=” within the DSE says to evaluate this as an expression. In other words, substitute in the value of the SalesExprField and then evaluate the Sum().

Do you want to see what DSE is actually substituting? In a Straight Table, clear the Label field. The substituted expression will be visible in the column heading.

If we are not sure that “Always On Selected” is guaranteed, we should use some type of aggregation function to ensure a single SalesExprField is selected. For example, to take the first possible value:

Sum($(=FirstSortedValue(DISTINCT SalesExprField,1)))

Instead of just parameters to  the sum() function, we could have included the entire expression in our island table,  eg “Sum(SalesAmount_LC + VAT_LC)”. In that case our chart expression would be:

$(=SalesExprField)

What if want to control options via variables instead of fields? Use a Set expression in the Only() function.

Sum(
$(=only({<CurrencyType={$(vCurrencyType)},[Include VAT?]={$(vIncludeVAT)}>}SalesExprField))
)

If you’ve read this far, you are probably wondering “Is there a downloadable example”. Yes, it can be found here. Qlikview Cookbook: How to Choose an Expression.

-Rob

Want more performance tips?  Come see Oleg Troyansky’s Performance Tuning session and  additional tips  from other presenters at the “Masters Summit for Qlikvew”.

Share

How Not to Choose an Expression

We sometimes have a  requirement to select between two or more expressions  dependant on user selections or some other condition. For example, this chart Expression.

if(vWithCommission=1
 // Calculate with Commision
 ,sum({<CommissionFlag={1}>}
     SalesAmount - SalesAmount * CommissionRate)
 // Calculate without Commission
 ,sum(SalesAmount)
 )

The  if()  will select one or the other sum() expression based on the value of the vWithCommmision variable.  Because a variable is being tested —  not a row value — only one sum() will be chosen for the entire chart.

If performance is important,  this is not a good way to implement the requirement. QlikView does not “short circuit”. Short circuit means to stop evaluating when the condition is true. QlikView will calculate both sums and then return one of those results.

Some people mistakenly believe that putting the above expression in a variable changes the behavior. This is incorrect. The if() evaluation is still done at the chart level and the performance impact remains.

A performant way to solve the requirement is to put each calculation into a separate chart Expression and use the  Conditional property to select the Expression.

An Expression with  a Conditional evaluating to False will not be calculated or displayed.

There is also the possibility of choosing the calculation in a variable, but you have to follow a few rules.

  1. The variable should return the string representation of the formula. Note the single quotes in the example below.
  2. The variable definition should begin with “=”. This causes the if() to be evaluated only once.
  3. In the chart Expression, reference to the variable should be made with $(). eg  $(vChooseCalc)
=if(vWithCommission=1
    ,'sum({<CommissionFlag={1}>}
        SalesAmount - SalesAmount * CommissionRate)'
    ,'sum(SalesAmount)'
)

-Rob

For more performance tips, join us at the upcoming “Masters Summit for Qlikvew”  in San Francisco May 2015 where Oleg Troyansky presents an always enlightening session on Performance Tuning. Can’t make SF? Check out the other dates and locations on the website.

 

Share

Performance Tip – Using Dual() and Chart Visual Cues

I recently diagnosed a slow Straight Table chart. The chart contained 100K+ rows. One column contained a complex expression that returned a Y/N string flag for the column. Something like:

=if(complex expression, ‘Y’, ‘N’)

They also wanted to set the background color of the cell, green for Y, red for N. So the Expression Background Color property repeated the same complex expression to assign a color:

=if(complex expression, green(), red())

I surmised the expression was being calculated twice for each row. I changed  the main expression to set a Dual().

=if(complex expression, dual(‘Y’,1), dual(‘N’,0))

The chart cell still displays the Y/N text. But now I could use 1 and 0 values on the Visual Cues pane and eliminate the Background Color expression entirely. Much faster!

-Rob

Share

Speed up Script Development with BUFFER

A BUFFER prefix on a LOAD or SQL statement creates and maintains an automatic QVD for that statement. Subsequent executions of the LOAD/SELECT statement will read from the QVD, avoiding another (slower) trip to the database. A read from QVD is generally 5-10 times faster than fetching from database.

TranTab:
BUFFER LOAD 
TranID,
Amount,
CustomerID,
etc…
;
SQL SELECT * FROM TRANSACTIONS
;

On first execution, the SELECT will fetch rows from the database and the resulting TranTab will be stored in a specially named QVD on the local machine. On subsequent reloads, TranTab will automatically be loaded from the local QVD.

If you make a change to the TranTab LOAD/SQL statement, QV Reload will detect the change and fetch from the database again and update the local QVD.

During script development it’s not uncommon to perform a reload several times. You can greatly reduce the duration of a script run by adding BUFFER to your statements. Any script changes/adds you make will automatically invalidate that buffer and re-fetch from the database.

Don’t forget to remove the BUFFER keyword before moving to production!

You can read more about BUFFER and some optional parameters in the Qlikview Help.

-Rob

Share

DROP FIELD Does Not Release All Space

During the “Performance Tuning” session at the Barcelona Masters Summit, Oleg Troyansky demonstrated using Document Analyzer to identify unused fields followed by DROP FIELD statements to remove those unused fields from the data model. Roland Vecera offered an interesting discovery.  DROP FIELD after a BINARY LOAD does not free the expected amount of memory.

For Example:
Binary dropfieldstest_dm.qvw;
DROP FIELD A,B;

Roland has found that a subsequent LOAD RESIDENT of each affected table is required to fully reduce disk and RAM consumption to the expected level.

A field in a QVW is represented by three storage structures:
1. Symbol table, which stores the unique values of the field.
2. Record pointers, a pointer on each table row to the symbol value.
3. State space, where field selections are tracked.

Based on testing and calculation, my interpretation is that in this scenario (BINARY LOAD/DROP FIELD), the Symbols and State space is released. However, the space occupied by the Record pointers is not released, i.e. the records are not rewritten. This may be a significant amount of space, particularly when a table contains many rows.

For most developers this will be an obscure issue. But for people tuning large applications, this may provide an “aha”moment.

Thanks Roland!

Share

Autonumber() Key Fields and Sequential Integer Optimization

Today let’s reinforce some QV data model principles for us old timers and introduce a few tricks for newbies to Qlikview.

#1. Keys are not data. Key Fields in a Qlikview model should serve the data modeler, and not be used by chart Designers as data.

Consider two tables, OrderHeader (one row for each Order) and OrderDetails (one row for each Order Line). linked together by Field OrderID.

 

 

 

 

OrderID may be a value that you need to display in your charts. However, problems arise when you try to do things like count(OrderID). Which end of the connection should the expression count? It’s unreliable as discussed in detail here:
http://qlikviewnotes.blogspot.com/2010/01/best-way-to-count-keys-dont.html

The solution is to create a counter field on the table that represents the correct cardinality for the counter. If we are counting Orders, that would be the OrderHeader table.

In the LOAD of the OrderHeader table:

1 as OrderCounter

Part two of of the recommendation is to isolate the key field so it is not mistakenly used as a data field. We do this by prefixing the key field name with a special character and SETing the QV system variable “HidePrefix” to that character.

SET HidePrefix=’%’;

In the LOAD of both OrderHeader and OrderDetails:
OrderID as %OrderID

Fields that begin with the HidePrefix value will not show up in:
–  Current Selections.
– Dimension or Expression property dialog (unless “Show System Fields” is checked).

Of course, the specific values of OrderID may be useful to display in UI charts. In that case we must preserve it as a data item in one and only one table. We will include it in the LOAD of the OrderHeader table. Our data model now looks like this:

 

 

 

 

 

 

OrderID is available as a data field, but appropriately only from the OrderHeader table.

OrderCounter is now available as a field such that
=sum(OrderCounter)
will yield the correct Order count.

Now we (the Data Modelers!) own those “%” key fields! They are ours, we told the UI designers explicitly that “it is not data” .

Part three, and a very important part indeed, is to autonumber() the key fields. Autonumber() is a Qlikview lookup function that translates parameter values into integers. The sequential integers returned by autonumber() will reduce the RAM requirements and increase the linkage efficiency as detailed here
http://qlikviewnotes.blogspot.com/2008/05/memory-sizes-for-data-types.html
and
http://community.qlikview.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers

Admittedly, it gets a bit geeky. Bottom line, here is what you want to do for your keys:
autonumber(OrderID, ‘%OrderID’) as %OrderID

The autonumber() function converts values to sequential integers. The second parameter,  ‘%OrderID’, is important if you have multiple keys being autonumber() in your script.

To summarize:
#1. Keys are not data.
#2, Isolate keys using the “SET HidePrefix=x;” variable. Establish counter fields on the appropriate table.
#3. Use the Autonumber() function to convert key field values to sequential integers. This will minimize the memory footprint of the application and improve the efficiency of cross table look-ups.

My friend and colleague Barry Harmsen, author of QlikView 11 for Developers, who is a much wiser data modeler than I, will be discussing QV data modeling in depth with me at the Masters Summit for Qlikview in Europe this October. I hope you can join us!

Share

Super Fast Method to Retrieve QVD High Value

Delta Load, sometimes called Incremental Load, is the technique of pulling only changed or new rows from a database and then merging those rows with a master QVD. The Delta Load pattern follows these general steps:

1. Determine high water mark (“last reload”)
2. Build WHERE predicate in syntax of target DB.
3. SQL SELECT delta rows.
4. Merge delta rows with QVD.
5. If Deletes, INNER JOIN entire set of Keys from DB with QVD

The first step is to determine what cutoff value — delta point — do we want to pass in the SQL SELECT WHERE clause to identify new rows. This value is usually the highest value in the existing QVD.

The most robust and reliable method for step one is loading with max() from the existing QVD. For example:
LOADmax(LastModified) asMaxModifiedFROMmyqvd.qvd (qvd); 

This works well, but reading the entire QVD can take a very looooong time for a large QVD.

A much faster method is to aggregate the max value for the Delta rows only and then save that value on the first row of the QVD. In subsequent delta loads, only the first row of the QVD is read to get the value. This is extremely fast and is not effected by QVD size. For example:

Facts:
SQLSELECT * 
FROMmytableWHERELastModified>= ‘$(vMaxModified)’;

JOIN(Facts)
LOADmax(LastModified) asHighWaterValue
RESIDENTFacts;
// Do QVD Merge…and STORE Facts INTO Facts.qvd
Retrieve value later with:
 FIRST 1 LOADHighWaterValueFROMFacts.qvd(qvd

The “HighWaterValue” field will contain a relatively small number of values and will therefore have a negligible impact on the size of the QVD. You will of course have to create the field the first time before trying to load it with the “FIRST 1…”.

If you are using Qlikview Components (QVC) V8+ for delta loading, you can enable this high performance strategy by setting the variable:

SETQvc.Loader.v.StoreMaxModFieldValue = -1; 

QVC will automatically create a high water field named Qvc.MaxModFieldValue
and detect the availability of the field on subsequent loads. There is no need to pre-create the field before turning on this feature.

The technique is part of the Advanced Scripting material I’ll be presenting at the upcoming Masters Summit for Qlikview in Europe this October. I’ll be discussing more about Delta Loads, including strategies for merging large QVDs and tables with non-unique keys. Review the full agenda here and decide if the Masters Summit might help take your Qlikview skills to the next level.

Share

Document Compression

Today I offer up a discussion of Qlikview “compression”. That is, the Qlikview features that make overall data get smaller, and in some cases, larger.

Should you care? In most cases no. But understanding what “knobs you can turn” can be a useful tool for capacity planning and application tuning. Let’s look at the practices and parameters that affect data size.

 Script Execution:  Data read from sources – such as database tables – are read in to memory (RAM) by the script execution (reload) process. Duplicate values are reduced to the unique set of values for each column. A “Gender” column has only two values – “Female” and “Male”, so the storage required for this column is minimal compared to a column that has a wide range (cardinality) of values such as a timestamp.  This is not really “compression” but rather what I call “de-duplication”.

The ratio of database storage to document storage is dependent on the data content as well as the use of common script techniques like separating timestamps into date and time fields. A typical database to document ratio is 10:1. For example, 2GB of database tables might require 200MB of document RAM.

QVW write to Disk: After reload, the Qlikview document (data tables and screen objects) is written from RAM to Disk as a *.qvw file. If compression is set on (default) for the document, the qvw will be compressed as it is written to disk. The compression results will vary depending on data content, but is typically in the range of 2-5 times. For example, a document that requires 200MB of RAM will require somewhere between 40MB and 100MB of Disk storage.

If compression is set to “None”, the document will be written to disk in the same format it existed in RAM and will occupy the same storage on disk as it utilized in RAM.
The Compression option for each Document is set in the Document Properties, General tab. The default compression for new documents is defined the User Settings, Save tab.

The compression option will of course impact the amount of disk storage used. But it also affects the amount of time it takes to read or write a qvw. I find that for most documents, an uncompressed document will write and read significantly faster than a compressed document. Some documents, especially large ones with high compression ratios, will read faster if compressed. The other factor is speed of the disk being used – local disk or network disk.

I typically do my development with compression off and then do a timing test with both options before migrating to the server.

QVW read from Disk: The *.qvw is loaded to RAM by a developer or on the Server by a user session. The amount of RAM required is the uncompressed size, regardless if compression was used to write the *.qvw to disk.  As discussed in the previous section, my experience is that uncompressed documents read from a local disk typically load up faster, but this is not always true and is worth testing on large documents.

­What is the compression factor for QVD files?
 
Zero.

A QVD file contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into Qlikview RAM, “ready to go”. Because QVD is the RAM image, there is no compression.

A QVD read with an optimized load will require the same RAM size as its size on disk (1:1). A QVD read with an un-optimized load may require significantly more RAM, due to some numeric fields being converted to strings. The expansion is typically about 2:1 but varies considerably.

Here is a summary of the various “compression points” and typical results.
Source
Destination
Ratio
Example
Result
Notes
Source DB
2GB
Raw Data
Source DB
Document RAM
10:1
200MB
Data de-duplication
Document RAM
QVW Disk
3:1
67MB
Save Compression=High
Document RAM
QVW Disk
1:1
200MB
Save Compression=None
QVW Disk
Document RAM
1:3
200MB
Save Compression=High
QVW Disk
Document RAM
1:1
200MB
Save Compression=None
Document RAM
QVD Disk
1:1
200MB
QVD always uncompressed
QVD Disk
Document RAM
1:1
200MB
Optimized load
QVD Disk
Document RAM
1:2
400MB
Non-Optimized load

If your documents are small and you are not experiencing performance issues, worry about none of this.

Compressed documents occupy less disk space and their smaller size makes them easier to manage for moving, backup, etc.

If you are trying to get a document to load faster, try turning off document compression and benchmark your results. Consider the type of disk when making this decision. Compression may more important in a network storage environment where reducing the amount of data transferred is a significant performance factor.

It’s important to understand that the document compression option has no impact on RAM usage. It only impacts the amount of data read and written to disk.
Share