All posts by Rob Wunderlich

Simplify with Preceding Load

Dec 5, 2009 Rob Wunderlich 6 Comments

Most QV script developers are introduced to “preceding load” as a LOAD that precedes an SQL SELECT. But a LOAD may also precede another LOAD, which can be a very useful tool.

Let’s review a typical preceding load.

Table1:
LOAD Customer, Sales, today(1) as LoadDate ;
SQL SELECT Customer, Country, Sales FROM SalesResults ;

The absence of a “FROM” or “RESIDENT” clause in the LOAD is what makes this a “preceding load”.
The SQL SELECT will be executed first. The results of the SELECT will be used as input to the LOAD statement.
Table1 will have three fields — Customer, Sale, LoadDate.
The Field “Country” will not be present in Table1 because “Country” is not repeated on the LOAD statement.
The field “LoadDate” does not exist in the SQL SELECT and is added by the LOAD.

Let’s look at an example of where preceding load can be useful. When loading data, you may need to use expressions to parse or cleanse data. For example, extracting a timestamp from a string in a text file.

Table1:
LOAD
timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

What if you want additional time dimensions from the data? You could add expressions like:

date(date#(mid(@1:n, 3, 6)) as EventDate
month(date(date#(mid(@1:n, 3, 6))) as EventMonth

The script would soon get messy with “paren-disease” and become harder to maintain. Preceding Load to the rescue.

Table2:
LOAD *,
floor(EventTime) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour
;
LOAD
timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime, mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

The syntax is greatly simplified by reusing the “parsed once” EventTime.
Table2 will contain six fields: Event, EventTime, EventDate, EventMonth, EventYear, EventHour.
The “*” in the top load includes the fields emitted by the bottom load — EventTime & EventTime.

Preceding Loads may also be stacked more than two deep as in this example.

Table2:
LOAD *,
if(match(EventMonth, ‘Aug’, ‘Dec’) OR weekday(EventDate) > 5, ‘Holiday’, ‘Standard’) as Rate;

LOAD *,date(floor(EventTime)) as EventDate,
month(EventTime) as EventMonth,
year(EventTime) as EventYear,
hour(EventTime) as EventHour;

LOAD timestamp(timestamp#(mid(@1:n,3,12), ‘MMDDYYhhmmss’)) as EventTime,
mid(@1:n, 17) as Event
FROM myfile.txt (fix, codepage is 1252);

Preceding load is a useful tool to simplify the syntax of your script and make it easier to maintain.

-Rob

Scripting

Understanding Join and Concatenate

Nov 19, 2009 Rob Wunderlich 19 Comments

The Qlikview script functions JOIN and CONCATENATE can sometimes be used to tackle the same problem, but there are important differences that should be understood.

Examine the sample tables below. Note that they share one common field name, “Key”. Also note that Table1 has a Key value “3” that is not present in Table2.

JOIN will combine rows where the Key value matches. The keyword OUTER will also retain rows that do not match rows in the other table. Here’s what the merged table will look like after an outer join.

OUTER JOIN (Table1) LOAD * RESIDENT Table2;

Values A1 and C1, which were in different tables, now occupy the same row in the result table. The row with Key 3 has missing values for C & D, because there was no matching Key in Table2.

Creating a chart that uses “Key” for dimension will produce results similar to the Table Box above.

The important point is that values with the same Key value have been merged together into a single row. If value A1 is selected, note that values C1 & D1 remain associated (white). The set A1,B1,C1,D1 is indivisible.

Now let’s look at Concatenate. Concatenate appends the rows of one table to another. Concatenate never merges any rows. The number of rows in a concatenated table is always the sum of the rows from the two input tables. Here’s what our sample data will look like after Concatenate.

CONCATENATE (Table1) LOAD * RESIDENT Table2;

Rows with like Key values are not merged together. The rows from Table2 are simply appended to Table1. Because the tables have different fields, rows will have null values for the fields from the “other” table.

If the data is used to build a chart that utilizes the common field “Key” as dimension, the chart looks just like the JOINed table.

Let’s make the selection “A1” in Field A and see it’s impact on our visible charts and tables.

When A1 is selected, the association to C1 & D1 is lost and C/D values become null in both the Chart and Tablebox. We cannot select both A1 and C1. This is a different result than the JOINed example.

Let’s consider a more realistic example where we may choose between JOIN and CONCATENATE. Consider the two tables below. Note that only one BudgetAmount row is present for each Region-Year combination. In the Sales table, the SalesAmount is broken down by Department within Region.

If we load both tables we can produce a chart using expressions like =Sum(BudgetAmount).

The Budget and Sales values have been summed correctly.

We then notice that we have an undesirable synthetic key, created by the Budget and Sales tables sharing the Year and Region fields. One approach to eliminate the synthetic key would be JOIN or CONCATENATE. But which one in this case?

Let’s try JOIN and see what the Chart looks like.

OUTER JOIN (Budget) LOAD * RESIDENT SALES

The summed Budget numbers are incorrect!

A look at raw data of the joined table will identify the problem. The JOIN repeated the BudgetAmount value on each Department row.

Let’s try with CONCATENATE.

CONCATENATE (Budget) LOAD * RESIDENT Sales;

The numbers are now correct and we’ve accomplished the goal of eliminating the synthetic key.

A peek at the data in the Concatenated table will make it clear why the chart is now correct. There is only BudgetAmount value or each Year-Region.

JOIN and CONCATENATE are both very useful and frequently used functions in Qlikview. It’s important to understand the differences between them.

-Rob

Scripting

BOQC: Flexible Interval Classification

May 13, 2009 Rob Wunderlich

Another post in the “Best of QlikCommunity” series.

In this QlikCommunity Forum thread the poster asked about using the class() function to create a dynamic dimension of 30 minute intervals in a chart. He wanted to format the class values as display friendly time ranges.

My solution would have been to use mapping to format the classes to the desired display format. However, this would have been a lot of data entry for 48 intervals in a day.

John Witherspoon posted a more elegant solution utilizing a simple expression.

dual(time(floor(timestamp, 1/48),’h:mm TT’) & ‘ – ‘ &
time(ceil (timestamp, 1/48),’h:mm TT’)
,floor(frac(timestamp),1/48))

Using John’s example, I was able to extend the idea to easily provide for a user selectable interval size.

Read the thread for details.

-Rob

Tools

Analyzing Field Usage in a QVW

Apr 23, 2009 Rob Wunderlich 12 Comments

Wouldn’t it be nice to know what fields are “unreferenced” in your document? By unreferenced, I mean fields that are not used in Expressions, Dimensions or Keys. These are fields that if removed from your document, would have no impact on the visible elements of the application.

Removing unused fields is sometimes important in addressing performance issues and generally makes your application easier to maintain and understand.

The Expression Overview dialog is great for finding where a field is used, but what about the unused fields? I don’t know of any QV supplied tool that can identify unused fields, so I created one myself. My tool is a QVW named “DocumentAnalyzer” and it’s available for download from:

http://robwunderlich.com/Download.html

First off, let me make it clear that this tool is imperfect. It’s difficult to do a precise field usage analysis from “outside” of the product. I hope that this work of mine will encourage (goad?) Qliktech into writing a field usage analysis tool within the QV product. I’ll be happy if my work becomes obsolete.

I’ll explain the usage and limitations of DocumentAnalyzer as well identify some interesting code tidbits for anyone who may want to enhance or borrow from this app.

The app code itself consists of two pieces. 1) A Macro Module that extracts meta information from the document to be analyzed (the “target” document) 2) A load script that processes the extracted meta data.

The Macro does a lot of filesystem access and requires System Access module security. If you have not allowed System Access, the macro will warn you and provide instructions for setting it.

Using DocumentAnalyzer is a two step process, driven by buttons in the Main sheet. The first step is to choose a target document. Pressing the button will bring up a standard windows file chooser dialog. If a file chooser dialog cannot be created, the user is instructed to type a filename directly in the input box.

Once a target document is selected, press the “Process Document” button. The Macro module will be invoked and extract the metatdata to a series of files in your temp directory. After extraction the load script will read the extracted files. At the end of the load script, the metadata files will be deleted.

The tool will open the target document and navigate through the screens. Keep your hands off the keyboard while this process runs. When the load script is complete, you’ll receive a msgbox like this. Press OK.

The first chart of interest can be found on the “Fields” sheet. The Field References chart lists each FieldName and indicates whether the FieldName was used as a Key, Dimension, In an Expression, Macro or Variable. FieldNames not referenced anywhere are highlighted in yellow. These are fields you might consider dropping from the document.

FieldNames that contain special characters are flagged in red. The usage of these fields cannot be accurately determined due to limitations of the parsing method I’m using.

The Exception sheet displays you may consider warnings about the data presented. Of particular interest is the “Unmatched Fields” listbox. Check out the Help in this listbox to understand it’s contents.
The Parsing Algorithm
The identification of field rerefences is performed by the load script. The parsing is rather primitive so I’ll provide some explanation here so you can understand it’s limitations. You can also find this information on the “About” sheet of DocumentAnalyzer.
Field names are discovered in expressions by replacing “special” characters with a delimiter and then parsing into words using the subfield() function. For example, the expression:

“Avg(Price * Quantity) * 1.05”is delimited to:
“AvgPrice Quantity 1.05”which is then parsed into four words — Avg, Price, Qty, 1.05 . The words are then matched against fieldnames. You will get a false match if a function name such as “Avg” is also used as a field name. If you just want use DocumentAnalyzer, no need to read further. If you want to learn something about the code, read on.

The Code

The Macro writes the metadata to a series of files. The files are normally deleted at the end of the script.
If you want to keep the metadata files, comment out this line in the script (on the Cleanup tab):
LET x = DeleteFolder(‘$(f)’); // Delete the data files

The Macro Module is of a fair size — about 500 lines. If you examine the code, you’ll find some conditional code (IsInternal()) devoted to allowing the macro to run internally in a qvw or externally from cscript.exe. The entire macro module can be copied to an external file and run with cscript. I coded for “dual” execution environments because I get a better editor and debugger in the external environment. I do the development running externally and then paste the script into the QVW for final testing.

Meta information (Field names, Dimensions, Expressions, etc) are extracted from the target document using the QV API. Getting Dimension values was fairly easy, they are only a few API paths for the various object types.

Finding all the expressions was the most challenging part and took the most time to solve. There are many different properties where expressions may be used in sheet objects. They may also differ by release. I could not see discovering and writing all the API calls to extract every possible expression. I experimented with a number of approaches, including generating code from the API doc — never got this to work correctly. The most promising approach was using the file export from the Expression Overview dialog. This gave me a complete list of expressions, but the exported file was not consistently usable. The export file is a tab delimited file. If an expression uses tabs or newlines it can make file impossible to navigate.
My eventual solution was to export the objects into XML using the
WriteXmlPropertiesFile outputFile API method and then extract the expressions from the XML files. I first tried to get the expressions using load script, but found this too cumbersome. I settled on using XPath to extract the data I needed from the XML files. XPath is a sort of “query language” for XML. Where SQL returns a set of rows, XPath returns a set of XML elements. This required only a few XPath expressions to cover all the possible expressions.
Once I perfected the XPath method, I switched to doing the Dimension extraction this way as well.
In QV 8.5, Sheets do not have an XML representation. So Sheet expressions (background color, conditional show, etc) are extracted indivdually.
The Document Variables. Macro Module and Script are written to files as well. The Script is not currently processed by the load.
Contact me if you have any problems or questions on using the app. Contact information is on the “About” sheet.
Happy analyzing!
-Rob

Scripting

BOQC: ApplyMap instead of Join

Apr 23, 2009 Rob Wunderlich 3 Comments

Today marks my first blogging of “Best Of QlikCommunity” (BOQC) where I plan to highlight what I find to be particularly useful or interesting posts on the QlikCommunity Forums.

There are cases when the ApplyMap() function is a very useful alternative to Join. For some time I have noticed both Oleg Troyansky and Michael Nordstrom dropping this hint on QlikCommunity but I never quite understood the power of the technique until a post Oleg made today:
http://community.qlikview.com/forums/p/16017/62222.aspx#62222

The original forum question was how to multiply two fields from different tables to derive a new calculated field. The tables share a common key field.

I usually would have approached this with two Joins. That approach works, but sometimes I don’t really want my data model to reflect Joined tables. I just want to do the calculation.

If you want to see the ApplyMap() solution, read the post linked to above. The thread explains it better than me repeating it here.

Visualization

Effective Visual Communication

Apr 21, 2009 Rob Wunderlich 1 Comment

Now you own a brand new power saw (Qlikview). But do you know how to build a house?

A key feature of Qlikview is graphical representation of data. Do you have the skills to use that capability effectively? The challenges of effective visual communication are not unique to Qlikview. The principles of visual communication can be learned and there are many resources available to learn from.

An important and useful work to me has been the book “The Visual Display of Quantitative Information” by Edward R. Tutfe (ISBN-0961392142). In this book, Tufte defines and articulates many important principles of statistical graphical representation, including the “Lie Factor” “Chartjunk”, and the “Data-Ink ratio”. The principles of Chartjunk and the Data-Ink ratio ask us to consider which of these two charts communicate more effectively.

Tufte has authored several more books that expand and illuminate these themes of quality and effectiveness in graphical communication. All of the books are beautifully printed. Tufte also teaches seminars and maintains a web site at: http://www.edwardtufte.com/tufte/

Another work I have found very useful is “Information Dashboard Design” by Stephen Few (ISBN-0596100167). This book applies visual communication principles from Few, Tufte and other researchers to the practical application of “dashboard design” . Few’s book analyzes the shortcomings of some poor designs and then goes on to enumerate a number of elements for effective designs. He closes the book with some examples of optimal designs.

Stephen Few also teaches public workshops. His schedule and blog can be found at http://www.perceptualedge.com/.

Lastly, the IBM Many Eyes project http://manyeyes.alphaworks.ibm.com/ is an ongoing source of inspiration and amusement.

-Rob

Scripting

Using MapSubstring() to edit strings

Jan 9, 2009 Rob Wunderlich 6 Comments

The MapSubstring() function is a powerful alternative to using nested Replace() or PurgeChar() functions.

MapSubstring(), unlike it’s siblings ApplyMap() and Map, will apply multiple mappings from the mapping table. Here’s an example.

ReplaceMap:
MAPPING LOAD * INLINE [
char replace
)
)
”
,
/
–
] (delimiter is ‘ ‘)
;

TestData:
LOAD
*,
MapSubString(‘ReplaceMap‘, data) as ReplacedString
;
LOAD * INLINE [
(415)555-1234
(415)543,4321
“510”123-4567
/925/999/4567
] (delimiter is ‘ ‘)
;

In field “ReplacedString“, all the characters matching the first field of the map (“char”) are replaced with a backslash as shown in this table. This makes it ready for parsing with a function like SubField().

Another usage is an alternative to nested PurgeChar() to remove multiple characters. A blank is used as the mapping character. For example:

PurgeMap:
MAPPING LOAD * INLINE [
char replace
)
)
“
,
/
–
] (delimiter is ‘ ‘)
;

MapSubString(‘PurgeMap’, Data) will produce results like this:

-Rob

Visualization

The design of grouped objects

Jan 5, 2009 Rob Wunderlich 5 Comments

I recently submitted a feature request to Qliktech requesting “grouped objects”. There are a number of issues to consider when thinking about how grouped objects should be implemented in QV.

You are probably familiar with the idea of grouped objects. It’s a feature commonly available in graphics authoring programs Select a set of objects and mark them as “grouped”. Operations such as Move or Resize then apply to the group.

The problems I’m trying to address by asking for grouped objects:

When two or more objects are logically related, it’s would be useful if they could be minimized or restored together.

It’s a common practice to layer objects upon one another to produce an effect that is not available with a single object. For example, Stephen Few’s Bullet Graph is commonly implemented in QV as a chart upon a chart. The Qliktech demo “Finance Controlling” uses this technique. Another common use case is placing a transparent text box on a chart to provide helptext, or a button to provide action.
Currently, if you layer objects, you also have to prohibit the user from moving or resizing the objects. If not, user move or resize operations may cause the objects to be misaligned.

If QV would let me select a set of objects and mark them as a “group”, how should the group of objects respond to various user actions?

Let’s first consider a group of two objects that are related, but do not overlap on the screen. For example, a straight table and text box. The text box contains some summary data from the table. Here’s are some possible behaviors:

If the user moved any object, all objects would move as a group, maintaining relative position to one another.
If the user resizes one object, all objects will resize proportionally and maintain position relative one another.
If the user maximizes one object, the group will enlarge proportionally to fill the screen.

Let’s consider another case. Two overlayed bar charts are used to create a Bullet Graph. One chart also contains a button in the caption area. It’s critical that the overlayed charts maintain position and size relative to one another. The button is a fixed artifact that should not resize but must remain in the “correct” position. How should the behavior differ from the example above?

If one of the charts is resized or maximized, both charts must resize. The button should not resize. This could be controlled by use of the “allow move/resize” property of each object. If the property is checked, the object will participate in a group resize. If unchecked, the object will not resize with the group. However, all objects will maintain relative position.

What specifically, is “relative position”? I define it as the “x,y pixel distance from the nearest corner of a parent object. The parent is the object in the same group who’s corner is closest to the child object”. An object can only be relative to one parent object. The parent may change after a resize operation.

It would simplify the grouping process if the “Allow Move/Size” property were split into two separate properties — “Allow Move” and “Allow Size”. In the chart & button example, all objects would “Allow Move” but only the charts would “Allow Size”.

Your comments and improvements on this proposal are welcome.

-Rob

Update 01/09/2009: What about Object “Print” and “Copy To Clipboard -> Image” functions? My proposal would be to include all objects that share a pixel with the right-clicked object.

Scripting, Tools

Extracting data using Microsoft Logparser

Dec 11, 2008 Rob Wunderlich 3 Comments

I’ve fielded several questions lately about loading Group membership information from Active Directory. The Active Directory sample in the Qlikview Cookbook uses AdsDSO and can load “single-valued” fields such as Name and Mail. AdsDSO will not load “multi-valued” fields such as “memberOf” or “member” — fields that define group members.

To read multi-valued fields, you’ll have to install some sort of tool to extract the data into a format QV can load. There are a number of free and commercial utilities available that will extract AD information into a text file.

My favorite tool for AD extracts is the free Microsoft Logparser. Google for it, you’ll find lots of information as well as the download link. There is also a Logparser book and forum available.

Logparser can read data from many different inputs — Active Directory, IIS logs, Windows Event logs, Registry — to name a few. Logparser can write to several different output formats, CSV being the most useful for QV.

Logparser uses a SQL syntax for it’s queries. Here’s an example:

logparser -objClass:Group “select cn, member into tmpAdGroups.csv from LDAP://mydomainController”

This Logparser query will create the output file “tmpAdGroups.csv”. The file will contain one row for each group (cn). The members of the group will be returned as a single field with the members separated by the pipe “|” character. The members are easily separated in the QV load using the QV subfield() function:

subfield(member, ‘|’) as member

Other uses I’ve found for using Logparser with Qlikview:

Extracting data from Windows Event logs.
Preprocessing IIS log files. The fields contained in a IIS log can vary between sites and may also change dynamically within the same physical file. Logparser can neutralize these differerences and produce a common input for QV load.

Logparser is a favorite tool of mine. I use it frequently for non-Qlikview tasks as well.

-Rob

Update 12/12/2008 I’ve published a complete example of using logParser to extract Group and User data from AD for loading into QV. The example is in version 9 of the Qlikview Cookbook available at http://robwunderlich.com/Download.html

Visualization

Using the Evaluate API in Macros

Nov 20, 2008 Rob Wunderlich 3 Comments

A common application of QV macros is to make initial selections. These selections may be applied when the document is opened, a sheet is activated or a button is pressed. The selections usually involve some type of dynamic value, such as “today()”.

In the examples below, the VBS continuation character “_” is used to indicate the statement continues on the next line. You may type the statement on a single line if wish and omit the “_”.

A typical macro to select yesterday’s date looks like this:

ActiveDocument.GetField(“ShipDate”).Select _
Date()-1

It’s all VBScript, but the functions used come from two different products. The ActiveDocument…Select is from the QV API, and the Date() function is a VBScript function. To make this work, you need some knowledge of VBS functions. In addition, there may also be a type mismatch — the VBS function returns a number and the Select expects a string. Fixing that issue requires adding a CStr() or CDbl() function.

What if you want to make a more complex selection. Like the weekday for today? Or a selection that requires knowledge of the data values — max(ShipDate)?

Wouldn’t it be easier if you could use QV Expressions to define the selection expression? You can, with the ActiveDocument.Evaluate API function. Here are some examples.

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“date(today(1)-1)”)

ActiveDocument.GetField(“Weekday”).Select _
ActiveDocument.Evaluate(“weekday(today(1))”)

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“max(ShipDate)”)

The Evaluate argument is any expression that can be evaluated by QV. So you can stick with the QV functions you are used to. Those functions are also more likely to produce the correct data type for your selection. And most importantly, you get easy access to the QV data — (“max(ShipDate)”).

You can even use QV search operators such as “>”. For example, to select the last 7 days:

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“‘>’ & date(today(1)-7)”)

I hope you find this tip useful.

-Rob

Recipes for Qlikview Success