All posts by Rob Wunderlich

Effective Visual Communication

Now you own a brand new power saw (Qlikview). But do you know how to build a house?
A key feature of Qlikview is graphical representation of data. Do you have the skills to use that capability effectively? The challenges of effective visual communication are not unique to Qlikview. The principles of visual communication can be learned and there are many resources available to learn from.
An important and useful work to me has been the book “The Visual Display of Quantitative Information” by Edward R. Tutfe (ISBN-0961392142). In this book, Tufte defines and articulates many important principles of statistical graphical representation, including the “Lie Factor” “Chartjunk”, and the “Data-Ink ratio”. The principles of Chartjunk and the Data-Ink ratio ask us to consider which of these two charts communicate more effectively.

Tufte has authored several more books that expand and illuminate these themes of quality and effectiveness in graphical communication. All of the books are beautifully printed. Tufte also teaches seminars and maintains a web site at: http://www.edwardtufte.com/tufte/

Another work I have found very useful is “Information Dashboard Design” by Stephen Few (ISBN-0596100167). This book applies visual communication principles from Few, Tufte and other researchers to the practical application of “dashboard design” . Few’s book analyzes the shortcomings of some poor designs and then goes on to enumerate a number of elements for effective designs. He closes the book with some examples of optimal designs.

Stephen Few also teaches public workshops. His schedule and blog can be found at http://www.perceptualedge.com/.

Lastly, the IBM Many Eyes project http://manyeyes.alphaworks.ibm.com/ is an ongoing source of inspiration and amusement.

-Rob
Share

Using MapSubstring() to edit strings

The MapSubstring() function is a powerful alternative to using nested Replace() or PurgeChar() functions.

MapSubstring(), unlike it’s siblings ApplyMap() and Map, will apply multiple mappings from the mapping table. Here’s an example.

ReplaceMap:
MAPPING LOAD * INLINE [

char replace
)
)

,
/

] (delimiter is ‘ ‘)
;


TestData:
LOAD
*,
MapSubString(‘ReplaceMap‘, data) as ReplacedString

;
LOAD * INLINE [

(415)555-1234
(415)543,4321
“510”123-4567
/925/999/4567
] (delimiter is ‘ ‘)
;


In field “ReplacedString“, all the characters matching the first field of the map (“char”) are replaced with a backslash as shown in this table. This makes it ready for parsing with a function like SubField().

Another usage is an alternative to nested PurgeChar() to remove multiple characters. A blank is used as the mapping character. For example:

PurgeMap:
MAPPING LOAD * INLINE [

char replace
)
)

,
/

] (delimiter is ‘ ‘)
;



MapSubString(‘PurgeMap’, Data)
will produce results like this:

-Rob

Share

The design of grouped objects

I recently submitted a feature request to Qliktech requesting “grouped objects”. There are a number of issues to consider when thinking about how grouped objects should be implemented in QV.

You are probably familiar with the idea of grouped objects. It’s a feature commonly available in graphics authoring programs Select a set of objects and mark them as “grouped”. Operations such as Move or Resize then apply to the group.

The problems I’m trying to address by asking for grouped objects:

  • When two or more objects are logically related, it’s would be useful if they could be minimized or restored together.

 

  • It’s a common practice to layer objects upon one another to produce an effect that is not available with a single object. For example, Stephen Few’s Bullet Graph is commonly implemented in QV as a chart upon a chart. The Qliktech demo “Finance Controlling” uses this technique. Another common use case is placing a transparent text box on a chart to provide helptext, or a button to provide action. 

    Currently, if you layer objects, you also have to prohibit the user from moving or resizing the objects. If not, user move or resize operations may cause the objects to be misaligned.

If QV would let me select a set of objects and mark them as a “group”, how should the group of objects respond to various user actions?

Let’s first consider a group of two objects that are related, but do not overlap on the screen. For example, a straight table and text box. The text box contains some summary data from the table. Here’s are some possible behaviors:

  1. If the user moved any object, all objects would move as a group, maintaining relative position to one another.
  2. If the user resizes one object, all objects will resize proportionally and maintain position relative one another.
  3. If the user maximizes one object, the group will enlarge proportionally to fill the screen.

Let’s consider another case. Two overlayed bar charts are used to create a Bullet Graph. One chart also contains a button in the caption area. It’s critical that the overlayed charts maintain position and size relative to one another. The button is a fixed artifact that should not resize but must remain in the “correct” position. How should the behavior differ from the example above?

If one of the charts is resized or maximized, both charts must resize. The button should not resize. This could be controlled by use of the “allow move/resize” property of each object. If the property is checked, the object will participate in a group resize. If unchecked, the object will not resize with the group. However, all objects will maintain relative position.

What specifically, is “relative position”? I define it as the “x,y pixel distance from the nearest corner of a parent object. The parent is the object in the same group who’s corner is closest to the child object”. An object can only be relative to one parent object. The parent may change after a resize operation.

It would simplify the grouping process if the “Allow Move/Size” property were split into two separate properties — “Allow Move” and “Allow Size”. In the chart & button example, all objects would “Allow Move” but only the charts would “Allow Size”.

Your comments and improvements on this proposal are welcome.

-Rob

Update 01/09/2009: What about Object “Print” and “Copy To Clipboard -> Image” functions? My proposal would be to include all objects that share a pixel with the right-clicked object.

 

 

Share

Extracting data using Microsoft Logparser

I’ve fielded several questions lately about loading Group membership information from Active Directory. The Active Directory sample in the Qlikview Cookbook uses AdsDSO and can load “single-valued” fields such as Name and Mail. AdsDSO will not load “multi-valued” fields such as “memberOf” or “member” — fields that define group members.

To read multi-valued fields, you’ll have to install some sort of tool to extract the data into a format QV can load. There are a number of free and commercial utilities available that will extract AD information into a text file.

My favorite tool for AD extracts is the free Microsoft Logparser. Google for it, you’ll find lots of information as well as the download link. There is also a Logparser book and forum available.

Logparser can read data from many different inputs — Active Directory, IIS logs, Windows Event logs, Registry — to name a few. Logparser can write to several different output formats, CSV being the most useful for QV.

Logparser uses a SQL syntax for it’s queries. Here’s an example:

logparser -objClass:Group “select cn, member into tmpAdGroups.csv from LDAP://mydomainController”

This Logparser query will create the output file “tmpAdGroups.csv”. The file will contain one row for each group (cn). The members of the group will be returned as a single field with the members separated by the pipe “|” character. The members are easily separated in the QV load using the QV subfield() function:

subfield(member, ‘|’) as member

Other uses I’ve found for using Logparser with Qlikview:

  • Extracting data from Windows Event logs.
  • Preprocessing IIS log files. The fields contained in a IIS log can vary between sites and may also change dynamically within the same physical file. Logparser can neutralize these differerences and produce a common input for QV load.

Logparser is a favorite tool of mine. I use it frequently for non-Qlikview tasks as well.

-Rob

Update 12/12/2008 I’ve published a complete example of using logParser to extract Group and User data from AD for loading into QV. The example is in version 9 of the Qlikview Cookbook available at http://robwunderlich.com/Download.html

Share

Using the Evaluate API in Macros

A common application of QV macros is to make initial selections. These selections may be applied when the document is opened, a sheet is activated or a button is pressed. The selections usually involve some type of dynamic value, such as “today()”.

In the examples below, the VBS continuation character “_” is used to indicate the statement continues on the next line. You may type the statement on a single line if wish and omit the “_”.

A typical macro to select yesterday’s date looks like this:

ActiveDocument.GetField(“ShipDate”).Select _
Date()-1

It’s all VBScript, but the functions used come from two different products. The ActiveDocument…Select is from the QV API, and the Date() function is a VBScript function. To make this work, you need some knowledge of VBS functions. In addition, there may also be a type mismatch — the VBS function returns a number and the Select expects a string. Fixing that issue requires adding a CStr() or CDbl() function.

What if you want to make a more complex selection. Like the weekday for today? Or a selection that requires knowledge of the data values — max(ShipDate)?

Wouldn’t it be easier if you could use QV Expressions to define the selection expression? You can, with the ActiveDocument.Evaluate API function. Here are some examples.

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“date(today(1)-1)”)

ActiveDocument.GetField(“Weekday”).Select _
ActiveDocument.Evaluate(“weekday(today(1))”)

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“max(ShipDate)”)

The Evaluate argument is any expression that can be evaluated by QV. So you can stick with the QV functions you are used to. Those functions are also more likely to produce the correct data type for your selection. And most importantly, you get easy access to the QV data — (“max(ShipDate)”).

You can even use QV search operators such as “>”. For example, to select the last 7 days:

ActiveDocument.GetField(“ShipDate”).Select _
ActiveDocument.Evaluate(“‘>’ & date(today(1)-7)”)

I hope you find this tip useful.

-Rob

Share

The match() Function

In SQL, the “in” operator is commonly used to test if a value exists in a list of values. For example:

SELECT * FROM TABLE WHERE CODE IN (‘a’, ‘b’, ‘f’);

New QV developers often spend some time looking for QV’s equivalent of the “in” operator. It’s the match() function.

LOAD * WHERE match(CODE, ‘a’, ‘b’, ‘f’);

The match() function, and its siblings “mixmatch” and “wildmatch”. are documented in the “Conditional Functions” section of the Ref guide and the help:

match( s, expr1 [ , expr2, …exprN ] )
Compares the string s to a list of strings or string expressions. The result of the comparison is an integer indicating which of the comparison strings/expressions matched. If no match is found, 0 is returned. The match function performs a case sensitive comparison.

mixmatch() works just like match() except it does a case insensitive comparision.

wildmatch() is another form that can be particularly useful. wildmatch() allows (but does not require) the “?’ and “*” wildcard characters in the match arguments.

wildMatch(text, ‘*error*’)

will match:

“An error has occurred”
“Error processing account nnnn”

QV 8.5 provides a “like” operator that allows for testing against a single value with wildcards:
text like ‘*error*’

Wildmatch can test against multiple values:

wildMatch(text, ‘*error*’, ‘*warning*’)

The match() functions return a number indicating which of the comparison strings was found. You can use this index number nested in a pick function to do “wildcard mapping” as an alternative to a nested if() function.

pick(
wildmatch(PartNo,
‘*99’, ‘P1586’, ‘?15*’, ‘?17*’, ‘*’
),
‘Taxes’, ‘Premium Fuel’, ‘Fuel’, ‘Lubricant’, ‘Other’)

-Rob

Share

An Example is Worth a Thousand Thread Replies

There’s a lot of information being exchanged on the QlikCommunity Forum http://www.QlikCommunity.com these days. Customers and Consultants ask technical questions, and other Customers, Consultants and QT employees provide very useful answers. Today’s post is a tip on how to improve the chances of your Forum question being answered quickly and accurately.

Many back and forth replies to a forum thread are about clarifying the question. If possible, post a qvw file example with your question. (I can’t, my file is too big! The data is private! Keep reading for ways to handle these concerns).

Reasons to post an example qvw:

  • An example will help clarify your question. The Forum is conducted in English, but English is a second language for many, if not most, of the Forum users. An example will provide additional understanding of your question.
  • More likely to get an accurate and complete response. Many questions require the responders to fiddle with expression or script syntax. If I have a qvw to work with, I’m more likely to test my answer before posting it, saving you the trouble of learning that I forgot a comma in my recommended solution.
  • Time. Most Forum members answer questions on a volunteer basis and their time is limited. For myself, I can only take the time to answer a limited number of questions. I’m more likely to pick the questions that are clear and provide the data I need. If I have to code up my own test data to work on the problem, I’m less likely to respond.

Some of the reasons you may be reluctant to post your qvw — size and privacy.

The maximum attachment size allowed on the Forum is 1MB. You can make the example qvw smaller by using the QV Data Reduction feature.

  1. Make some selections to reduce the number of selected values in the qvw.
  2. From the menu bar, select File->Reduce Data ->Keep Possible Values.
  3. Use File->Save As to save the reduced copy under a new name.

    If you use “Save”, QV will still open the “Save As” to help you remember not to overwrite the master copy.

You can protect the privacy of sensitive information, such as account numbers, revenue or customer names by using the QV Scrambling feature. In the menu bar, select Settings->Document Properties->Scrambling.

Here you can select a field to scramble and press the “Scramble” button to perform a random scrambling of the field . No one can determine it’s original contents. Like values will scramble to the same value which maintains the value linkages.

In some cases, you may still be unable to post your qvw even with reduction and scrambling. Or it may make your example more clear to post the data inline with your question. In that case, post your example data in the question using comma delimited format, so it can easily be pasted to a LOAD INLINE. For example:

Accounts:
AccountNo, Name
1234, ABC Corp
4567, DEF Co

Transactions:
AccountNo, TranId, Amount
4567, 1, 2000

One last tip. Before you post, remember to search for existing answers to your question. In the past, search on the Forum was not so robust. But QT has recently added an embedded Google search feature. This is great! It supports the full range of Google search operators. Try it. The “Google Custom Search” link is available at the top of each Forum page.

Finally, don’t forget to mark your question as “solved” when you’ve received a satisfactory answer.

Happy posting!

-Rob

Share

Loading Multiple Excel Sheets

Load from Excel is usually pretty straightforward, but sometimes you’ll need to load multiple sheets and make some determinations at runtime. Details such as sheetnames may not be known at script creation time.

The QV statements “SQLTables” and “SQLColumns” may be used to discover information about the sheets and columns available in a workbook. Both of these statements require an ODBC connection. The ODBC connection may also be used to subsequently read the data, but I find using the LOAD biff more convenient.

First make a OLEDB connection to the workbook:
CONNECT TO [Provider=Microsoft.Jet.OLEDB.4.0;Data Source=workbook.xls;Extended Properties=”Excel 8.0;”];

Specify the workbook name, relative to the current directory, in the “Data Source=” parameter. This example uses a “DSN-less” connection. It does not require you to predefine an ODBC datasource.

The SQLTables statement return a set of fields describing the tables in the currently connected ODBC datasource, in this case the workbook. A “Table” is an Excel Sheet.

tables:
SQLtables;

Now I’ve got a list of sheets in the QV “tables” table. The field name that contains the sheetname is “TABLE_NAME”. I’ll loop through the set of TABLE_NAME values and load each one using a standard biff LOAD.

FOR i = 0 to NoOfRows(‘tables’)-1
LET sheetName = purgeChar(peek(‘TABLE_NAME’, i, ‘tables’), chr(39));
Sales:
LOAD *
FROM workbook.xls (biff, embedded labels, table is [$(sheetName)]);
NEXT

Sheetnames that contain blanks will be surrounded by single quotes. The purgeChar() function above removes any single quotes that may be present in the sheetname.

What if I only want to load those sheets names whose name begins with “Sales”? Wrap the LOAD statement in an IF statement to test the sheetname:

IF wildmatch(‘$(sheetName)’, ‘Sales*’) THEN
LOAD …..
END IF

How about this case? I want to load any sheet that contains the three columns “Sales”, “Year” and “Quarter”:

columns:
SQLColumns; // Get list of columns
// Join list with columns of interest
RIGHT JOIN (columns) LOAD *;
LOAD * INLINE [

COLUMN_NAME
Quarter
Sales

Year
]
;

// Create a count of how many columns of interest each sheet has
selectSheets:
LOAD TABLE_NAME as SheetName, count(*) as count
RESIDENT columns
GROUP BY TABLE_NAME
;
// Keep only the SheetName that have all 3 columns
RIGHT JOIN
LOAD SheetName
RESIDENT selectSheets
WHERE count = 3


// Load the selected sheets
FOR i = 0 to NoOfRows(‘selectSheets’)-1
LET sheetName = purgeChar(peek(‘SheetName’, i, ‘selectSheets’), chr(39));
LOAD….
NEXT


You may wonder if you could use the Excel Driver instead of the Jet provider like this:

CONNECT TO [Provider=MSDASQL;Driver={Microsoft Excel Driver (*.xls)};DBQ=workbook.xls];

The connection will complete and you can use this connection for SQL SELECTs. However, when SQLTables is called, the connection will enumerate tables/columns for all the *.xls files in the current directory.

This provider uses the parameter “DefaultDir=” (default is .) to control which directory is enumerated for SQLTables and SQLColumns calls. The DBQ parm plays no part. You may find this useful as an alternative to using a traditional “for each filelist…” loop to process multiple files.

Complete text of the examples presented here can be found in the QV Cookbook at:
http://robwunderlich.com/Download.html)

Share

Using the Google Chart API inside Qlikview

I recently had a requirement to create a heat map of the US States and set about exploring if there was an easier way than creating a QV scatter plot mapped over an image.

I found the Google Chart API. With some help from the QlikCommunity forum, I got a fairly pleasing result.

The Google Chart API is provided free by Google. You pass in an http request with parameters that describe the data and desired layout, and a chart image is returned. I won’t cover the details of the API, it’s well documented at http://code.google.com/apis/chart/. Rather, I’ll share my experience integrating it with QV.

Here’s a screenshot of my results. The map will update as selections are made in the Sales chart. The generated map is not clickable. It’s just a static image.

A working QVW of the above may be found in the Qlikview Cookbook at:
http://robwunderlich.com/Download.html

The map is a Straight Table with a single expression. The expression is the http://… string used to generate the map. The representation for the expression is set to “Image”. Thanks to Tom on the forum for showing me this technique.

The OnAnySelect document event is used to trigger a “showMap()” macro that creates the variables needed for the http string. In a production application, you would want to be more selective and use field level events on the fields relevant to the chart.

In addition to chart layout parameters, the http string contains two parameters that describe the data.

  • chld= provides the list of states
  • chd= provides the data values for the states

The States and Values are associated by ordinal position in the respective lists. I could not find a way to keep the lists in sync by using QV expressions alone. The solution was to use the Sales chart as my “data source”. The macro walks the rows of the table to build two variables — the State codes and the Sales values. Here’s a snippet of the macro code:

set obj = ActiveDocument.GetSheetObject("CH01")
' Collect the locations
locations = ""
for RowIter = 1 to obj.GetRowCount-1
locations = locations & obj.GetCell(RowIter,0).Text
next
ActiveDocument.GetVariable("vValues").setContent values, false

Another area where VBScript was useful was in encoding the data values as specified by the API. I chose the “simple encoding” method. The Sales values are translated to relative values within a range of single characters defined by the API. The doc http://code.google.com/apis/chart/#simple provided a javascript encoding example which I converted to VBS.

The encoding algorithm requires that the maximum value of the dataset be known to properly spread the individual values across the relative range. To determine the maxvalue in the macro, I use the QV evaluate() function to “callback” to the QV expression language.

maxValue = ActiveDocument.evaluate(
“max(aggr(sum(Sales), State))” )

Producing a chart with the Google API does have some downsides. The user must be connected to the internet and the chart will render slower than a native QV chart. It also does not provide for making selections in the chart and tooltip values like a QV chart does. But I found it to be a simple solution to my requirement. I hope that someday the QV product will provide regional maps as chart types.

Update October 3, 2008: Alistair on the QlikCommunity forum has posted an example of calling the Google Chart API without using macros, which I find to be the preferred method:

http://www.qlikcommunity.com/575/?tx_mmforum_pi1%5Baction%5D=list_post&tx_mmforum_pi1%5Btid%5D=3763&tx_mmforum_pi1%5Bfid%5D=9

The next update to the Qlikview Cookbook will include the “macro-less” technique.

Share

Memory sizes for data types

An earlier post of mine When less data means more RAM discussed the ways in which storage (“Symbol” space) needed for field values can increase depending on how a field is loaded or manipulated. This generated some followup questions on the QlikCommunity forum about the optimal storage sizes for fields of various data types.

What’s presented below is information gleaned from the documentation, QT Support and experimentation. The numbers come from the document memory statistics file. I hope someone from QT will help me correct any errors.

QV fields have both an internal and external representation. There is a video “Datatype Handling in Qlikview” available on QlikAcademy that explores this subject.This post is concerned with the internal storage of fields.

Numbers

I’ve found that the storage size appears to be related to the number of total digits. Storage size in bytes, for various digit ranges:

1-10 digits, size=4
11 or more digits, size=13

The above sizes assume that the internal storage format is numeric, which is usually the case if loading from a database. Numbers loaded as text such as from a text file or inline, may be stored as strings which will occupy different sizes.

Dates, Times and Timestamps

Different Database systems provide various degrees of precision in timestamps and I assume the ODBC driver is also involved with the exact value provided to QV during the load. QV times are the fractional part of a day, using up to 9 digits to the right of the decimal point.

– Best size for a Date, 4 bytes.
– Best size for a full Time, 13 bytes.
– Best size for a full Timestamp, 13 bytes.

These sizes can increase when the field is manipulated. Want to get the date portion of a timestamp? Don’t use

date(aTimestamp)

date() is a formatting function, it doesn’t “extract” the underlying date portion. In many cases, it actually increases storage size because the result may be a string. Instead, use

floor(aTimestamp)

this will produce a 4 byte integer result.

A common technique for reducing the memory footprint of timestamps is to separate the timestamp into two fields, integer date and fractional time. You can further reduce the number of unique time values by eliminating the hundredths of seconds, or even eliminating the seconds if your application is ok with minute precision.

Strings

Thanks to QT support for providing this detail on Strings.

“The representation is that each symbol has a pointer (4/8 bytes on 32/64-bit platform) + the actual symbol space. This space is the number of bytes (UTF-8 representation) + 2 (1 is a flag byte and 1 is a terminating 0) + 0, 4 or 8 bytes that store the numeric representation of the field.”

So on the 32bit version, a non-numeric string occupies 6 bytes more than the length of the string itself. A numeric string occupies 10 more bytes. For example:

“a” uses 7 bytes
“1” uses 11 bytes

The only way to reduce the string footprint is to reduce the number of unique values. This can be done by breaking the string into component parts if that makes sense in the application. For example, the first 3 characters of a 10 character product code may be a product class. Breaking the field into ProductClass and ProductNumber fields may reduce the number of unique values.

If the strings are keys that don’t need to be displayed, the autonumber() or autonumberhash128() functions can be used to transform the values to 4 byte integers. With these functions you can also get the “sequential integer optimization” which reduces the symbols space to zero.

I’ve found that concatenating fields in autonumber like
autonumber(f1 & f2)
can sometimes produce false duplicates. Better to instead use autonumberhash128 like
autonumberhash128(f1, f2)
This seems to always produce correct results.

Sequential Integer Optimization

For each field, QV maintains both a Symbol table — the unique values of a field — and a State array that tracks which values are selected. If the symbol values are consecutive integers, a very clever optimization takes place. The Symbol space is eliminated and the State array is used to represent both selection state and value. This is a very beneficial effect of using the autonumber functions.

The values need not begin at zero for the optimization to take place, they only need to be consecutive. A set of 5000 consecutive dates will occupy no Symbol space. Take one date out of the middle and the storage reverts to the standard 4 bytes for each date.

It’s not always necessary to be concerned about memory usage. But when it is, I hope this information proves useful.

Share