Analyzing Field Usage in a QVW

Wouldn’t it be nice to know what fields are “unreferenced” in your document? By unreferenced, I mean fields that are not used in Expressions, Dimensions or Keys. These are fields that if removed from your document, would have no impact on the visible elements of the application.

Removing unused fields is sometimes important in addressing performance issues and generally makes your application easier to maintain and understand.

The Expression Overview dialog is great for finding where a field is used, but what about the unused fields? I don’t know of any QV supplied tool that can identify unused fields, so I created one myself. My tool is a QVW named “DocumentAnalyzer” and it’s available for download from:

http://robwunderlich.com/Download.html

First off, let me make it clear that this tool is imperfect. It’s difficult to do a precise field usage analysis from “outside” of the product. I hope that this work of mine will encourage (goad?) Qliktech into writing a field usage analysis tool within the QV product. I’ll be happy if my work becomes obsolete.

I’ll explain the usage and limitations of DocumentAnalyzer as well identify some interesting code tidbits for anyone who may want to enhance or borrow from this app.

The app code itself consists of two pieces. 1) A Macro Module that extracts meta information from the document to be analyzed (the “target” document) 2) A load script that processes the extracted meta data.


The Macro does a lot of filesystem access and requires System Access module security. If you have not allowed System Access, the macro will warn you and provide instructions for setting it.

Using DocumentAnalyzer is a two step process, driven by buttons in the Main sheet. The first step is to choose a target document. Pressing the button will bring up a standard windows file chooser dialog. If a file chooser dialog cannot be created, the user is instructed to type a filename directly in the input box.

Once a target document is selected, press the “Process Document” button. The Macro module will be invoked and extract the metatdata to a series of files in your temp directory. After extraction the load script will read the extracted files. At the end of the load script, the metadata files will be deleted.

The tool will open the target document and navigate through the screens. Keep your hands off the keyboard while this process runs. When the load script is complete, you’ll receive a msgbox like this. Press OK.

The first chart of interest can be found on the “Fields” sheet. The Field References chart lists each FieldName and indicates whether the FieldName was used as a Key, Dimension, In an Expression, Macro or Variable. FieldNames not referenced anywhere are highlighted in yellow. These are fields you might consider dropping from the document.

FieldNames that contain special characters are flagged in red. The usage of these fields cannot be accurately determined due to limitations of the parsing method I’m using.

The Exception sheet displays you may consider warnings about the data presented. Of particular interest is the “Unmatched Fields” listbox. Check out the Help in this listbox to understand it’s contents.
The Parsing Algorithm
The identification of field rerefences is performed by the load script. The parsing is rather primitive so I’ll provide some explanation here so you can understand it’s limitations. You can also find this information on the “About” sheet of DocumentAnalyzer.
Field names are discovered in expressions by replacing “special” characters with a delimiter and then parsing into words using the subfield() function. For example, the expression:

“Avg(Price * Quantity) * 1.05”is delimited to:
“AvgPrice Quantity 1.05”
which is then parsed into four words — Avg, Price, Qty, 1.05 . The words are then matched against fieldnames. You will get a false match if a function name such as “Avg” is also used as a field name. If you just want use DocumentAnalyzer, no need to read further. If you want to learn something about the code, read on.


The Code


The Macro writes the metadata to a series of files. The files are normally deleted at the end of the script.
If you want to keep the metadata files, comment out this line in the script (on the Cleanup tab):
LET x = DeleteFolder(‘$(f)’); // Delete the data files

The Macro Module is of a fair size — about 500 lines. If you examine the code, you’ll find some conditional code (IsInternal()) devoted to allowing the macro to run internally in a qvw or externally from cscript.exe. The entire macro module can be copied to an external file and run with cscript. I coded for “dual” execution environments because I get a better editor and debugger in the external environment. I do the development running externally and then paste the script into the QVW for final testing.

Meta information (Field names, Dimensions, Expressions, etc) are extracted from the target document using the QV API. Getting Dimension values was fairly easy, they are only a few API paths for the various object types.

Finding all the expressions was the most challenging part and took the most time to solve. There are many different properties where expressions may be used in sheet objects. They may also differ by release. I could not see discovering and writing all the API calls to extract every possible expression. I experimented with a number of approaches, including generating code from the API doc — never got this to work correctly. The most promising approach was using the file export from the Expression Overview dialog. This gave me a complete list of expressions, but the exported file was not consistently usable. The export file is a tab delimited file. If an expression uses tabs or newlines it can make file impossible to navigate.
My eventual solution was to export the objects into XML using the
WriteXmlPropertiesFile outputFile API method and then extract the expressions from the XML files. I first tried to get the expressions using load script, but found this too cumbersome. I settled on using XPath to extract the data I needed from the XML files. XPath is a sort of “query language” for XML. Where SQL returns a set of rows, XPath returns a set of XML elements. This required only a few XPath expressions to cover all the possible expressions.
Once I perfected the XPath method, I switched to doing the Dimension extraction this way as well.
In QV 8.5, Sheets do not have an XML representation. So Sheet expressions (background color, conditional show, etc) are extracted indivdually.
The Document Variables. Macro Module and Script are written to files as well. The Script is not currently processed by the load.
Contact me if you have any problems or questions on using the app. Contact information is on the “About” sheet.
Happy analyzing!
-Rob

Share

12 thoughts on “Analyzing Field Usage in a QVW”

  1. Hi Rob,

    it’s great, that you have got your own QlikView Blog right now. Nice application!

    I was creating an application like this 2 years ago. I’ve only found a part of it in the old QlikView Forum.

    [Structure.csv] is a previously exported file. I think the problem was to get hold of the system fields for table and field names.

    1. Export XML Files:

    sub testxml for i = 0 to ActiveDocument.NoOfSheets – 1 set s=ActiveDocument.GetSheet(i) for j=0 to s.NoOfSheetObjects-1 Objektname = s.SheetObjects(j).GetObjectID s.SheetObjects(j).WriteXmlPropertiesFile “C:TEMP” & Objektname & “.xml” next next end sub

    2. Reload XML Files:

    Cartesian:
    Load
    @1:n as Line,
    Filename() as Object
    from *.xml (ansi, fix, no labels, header is 0, record is line)

    Join
    Load
    qvwTable,
    qvwField
    from [Structure.csv] (ansi, txt, delimiter is ‘;’, embedded labels);

    Field_to_Object:
    Load
    Object,
    qvwTable,
    qvwField
    from Cartesian where
    index(Line, qvwField);

    drop table Cartesian;

    Regards
    Christian Bader

  2. Hi Christian,

    I remember your post on the forum and recall it was the inspiration for my version. Thanks for your contributions to the community.

    -Rob

  3. I’ve uploaded a new DocumentAnalyzer_V1.4 to the web site that includes Captions. I noticed that captions are sometimes expressions, so I produced two fields — Caption and CaptionEvaluated. CaptionEvaluated is only populated if the Caption value starts with “=”.

    There is a new table on the Objects sheet that displays the captions.

    The captions can be extracted from the objects using the API, but the callpath is slightly different between object types. I found it easier to use the XPath extract method and pull them from the XML, creating a captions.csv file.

    Let me know if this is what you were looking for or if you find any captions missing.

    -Rob

  4. I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

    Sara

    http://pianonotes.info

  5. Hi rob
    this is very nice.
    a presume lot of work went into it and i would like to say thanks for the time invested in the benefit of the community.
    i am from Israel and hen using your qvw i was faced with problem that all my field names are in Hebrew and your code read them in from the csv files as gibberish. i looked into your script and saw that the problem arises from using ANSI to read in all the csv files. i replaced ANSI in the script with a Hebrew code-page and now its working great. thought i might mention it since i know QV is used globally and code-pages are different. maybe a variable that the user can change could solve this problem . 🙂 thanks again, GREAT!!!! piece of coding.

  6. Hi Daniel,
    Thanks for your comments. Could you send me a qvw file that uses a Hebrew codepage, along with codepage id, and I’ll see if I can find a general solution? Thanks for the idea. You can find an email address for me on my website.
    -Rob

  7. Currently (Sept 2010) on Windows 7 64-bit this does not seem to work. The module errors out on line 195 with the error “ActiveX component can’t create object: ‘Msxml2.DOMDocument.4.0’)”

    Cheers

  8. Try changing that line to:

    ‘Msxml2.DOMDocument.6.0’

    I believe V6 is what ships with Windows 7. Let me know if it works and I’ll update the code.

  9. I always got this error “ActiveX component can’t create object: ‘Msxml2.DOMDocument.4.0’)”

    until this run when I changed DOMDocument.4.0 to DOMDocument.6.0

    It works like a charm now. Thanks Rob.

Comments are closed.