QSDA Pro 3.6 introduces Custom Flags — a way to enforce your organization’s unique standards alongside QSDA’s built-in analysis.
In addition to built-in QSDA flags that expose findings like errors and best practices, you can now define additional flags specific to your site.
Do you have unique company standards like:
Every App and master measure should have a description?
Apps in the Finance stream should have Section Access enabled?
Mobile Apps should have no more than ten objects per sheet?
You define the triggering condition, flag category and detailed flag description. Any triggered flags will appear in QSDA alongside the built-in flags. Custom flags use a distinctive icon to distinguish them from built-in flags
You can assign the new flag to one of the built-in categories like “Practice” or assign to the new “Custom” category.
Create and edit rules under Admin > Custom Flags
In the rule above, we define that every Sheet must have a thumbnail. When we select a Target Type of Sheet, properties of Sheet are made available in the Conditions and Detail Properties dropdowns. QSDA evaluates these conditions during analysis and raises the flag when they are met.
The Detail Properties are used to collect {n} placeholder values that can be substituted in the Description field. When this flag is raised in the QSDA analysis, the Description placeholder will be filled in like this.
You may have multiple conditions defined with and/or.
We’ve done our best to expose the data we think customers will need for rules. We currently expose these Target Types:
Each target type offers a number of different properties. If we missed something you need or you just have a question about how we can meet a specific requirement, please reach out to our friendly Motiosupport team.
Summary: A rich set of regular expression functions has recently been added in Qlik Sense cloud and client managed. In this post I’ll introduce some use cases for this feature and show how to instantly become a regular expression master.
Regular Expressions (RegEx) is a powerful string pattern matching language that has been available in many programming and scripting languages for decades. Qlik Sense Cloud and the client-managed May 2025 release brings the full power of RegEx to Qlik via a new set of Chart and Load Script functions.
What can you do with RegEx?
Validate that a field value matches a specific pattern. e.g. Is this a valid phone number or credit card?
Extract substrings from a larger string.
Reformat a string to match your rules.
Here’s an example to get us started. I’ll validate that “Phone” values match the allowed format for US phone numbers.
The string ‘^(?:([2-9]\d{2})\ ?|[2-9]\d{2}(?:-?|\ ?))[2-9]\d{2}[- ]?\d{4}$’ is a regular expression pattern that represents the validation rule for US phone numbers. Like any powerful and syntactically terse programming language, RegEx patterns can look a little daunting at first. Here’s the explanation of what that pattern enforces.
This regular expression for US phone numbers conforms to NANP A-digit and D-digit requirements (ANN-DNN-NNNN). Area Codes 001-199 are not permitted; Central Office Codes 001-199 are not permitted. Format validation accepts 10-digits without delimiters, optional parens on area code, and optional spaces or dashes between area code, central office code and station code. Acceptable formats include 2225551212, 222 555 1212, 222-555-1212, (222) 555 1212, (222) 555-1212, etc.
Did I write that pattern myself? No, I copied it off an internet site. There are many internet repositories with thousands of patterns you can copy from. Those repositories have quickly been supplanted by AI chatbots.
Let’s look at another example. I recently answered a question on Qlik Community where the poster wanted to extract First & Last names from strings like:
James SmithMary JohnsonJohn WilliamsPatricia BrownRobert Jones-GarciaMichael MillerLinda Davis
The Qlik function ExtractRegEx is the function we want to use. I know enough RegEx off the top of my head to create a pattern to split on the capital letters and spaces. But how do I deal with the hyphen in “Jones-Garcia”? It would make my head hurt.
AI is a great tool for generating RegEx patterns! Turn to your AI chatbot of choice. I use Copilot but I expect ChatGPT or any of the popular bots would do the job as well.
Asking Copilot to generate the pattern
The nice thing about this approach is I don’t have to explicitly state “..and handle hyphens”, I just supply sample data.
Is this pattern correct? You could just plug it into Qlik but I prefer to first validate using a RegEx tester. There are a number of free tools, regex101.com is the one I usually use.
After validating and potentially tweaking the pattern we use it in Qlik script like this:
I do recommend you spend 30 minutes learning the basics of RegEx to make best use of assistants and testing tools.
You may have used the Qlik script SubField function to split a string in multiple rows based on a delimiter such as a comma. But SubField won’t do when comma is part of a quoted string as is common in CSV files. The new SubFieldRegEx function handles this with ease.
Don’t quit if RegEx only gets you 90% of the way there. Stack some Qlik functions like the where clause above to finish the job.
RegEx implementations in other languages use an option flag to indicate a case-insensitive operation. Qlik has opted to use an additional function with the suffix “l” to indicate a case-insensitive operation. So:
MatchRegEx – case sensitive
MatchRegExl – case insensitive
A really nice feature of the Qlik MatchRegEx function is that it allows for multiple match patterns to be specified.
The Qlik RegEx functions are documented in the String Functions section. I do wish they had given the functions a separate section. I’ll summarize the available functions here with the caveat that these are correct at this time of this posting.
ExtractRegEx() extracts text from an input string expression using the specified regular expression pattern. The function returns a null value if no matches are found.
IndexRegEx() searches the input string and returns the starting position of the nth occurrence of the specified regular expression pattern.
MatchRegEx() compares the input string with one or more specified regular expression patterns, and returns the numeric location of the regular expression patterns that match.
ReplaceRegEx() returns a string after replacing one or more matches between an input string and a specified regular expression pattern.
SubFieldRegEx() extracts text from an input string expression, using the specified regular expression pattern as a delimiter.
There are a number of useful optional parameters, see the documentation for details. There is also ReplaceRegExGroup that I have not yet wrapped my head around. Feel free to post a use case in the comments.
Are you going to use these functions a lot? Probably not. But when you need to clean messy data or extract relevant data these functions can be invaluable. I’ll leave you with one last example, extracting data within HTML tags.
Longish post today where I show off the power of QSDA Pro to reduce Qlik App resources, saving you money and creating a more responsive experience for your users.
As a follow on to my last post “Help! My Qlik Cloud Reload Exceeds Quota!” I’m going to dive deeper into the available QSDA features to reduce memory requirements of a Qlik App. As a bonus we’ll use QSDA to improve the performance of a chart.
There are many good reasons to optimize your Qlik Apps to a leaner size. Smaller apps tend to perform better and place less demand on your server resources. If you are a Qlik Cloud customer, or planning a move to Qlik Cloud, App memory size impacts your license tier — and therefore your direct monthly costs.
In this post our starting point is I that have identified the App “BikeShop Sales” as having a large (597 MB) footprint. I got this information from Qlik App Metadata Analyzer as shown below.
Qlik App Metadata Analyzer
I could also have identified this memory footprint in QSDA Pro when we select this App from the list.
QSDA Applist
Let’s see how QSDA Pro can help. I’ll launch a new Analysis and give it the Description “Baseline”.
When the analysis completes press “View” to see the results.
QSDA Analysis Summary page
In the Resources card I see that unused fields occupy 383 MB — almost 2/3 of the base data model! Unused fields are fields that are not used anywhere in the front end. They can be removed without impacting the contents of charts or the user experience.
I also see some other suggested savings in dropping unused tables (“All fields unused but keys — 439 KB”) and a 530 KB savings by AutoNumbering keys.
A fast and simple way to remove unused fields is to drop them at the end of the script with a “DROP Fields …” statement. AutoNumbering is also best done at the end of the script. Let’s implement the QSDA recommendations. Open the Qlik Data Load Editor for “BikeShop Sales” in another browser tab.
The QSDA Tools menu provides a handy Script Generator to generate recommended script statements. Select the script generator from the Tools menu.
QSDA Tools menu
In the Script Generator > AutoNumber tab select the top checkbox to select all recommended fields. Press the “Copy” button to copy the generated AutoNumber statement to your clipboard. Paste the statements at the end of your Qlik Script. Repeat for the Drop Fields and Drop Tables tabs. (I won’t use the Load Statement tab just yet).
Now that I have updated the script with the AutoNumber and Drop statements, Reload.
Reload with AutoNumber, Drop Fields, Drop Tables
Returning to the QSDA App Card, I can see that Base App Memory is now 213 MB. Quite the savings from 507 MB!
App card after reload
At this stage, I recommend running a new analysis to confirm that the script changes have not introduced any problems or new issues.
Note that Reload Peak Memory is unchanged, because we loaded the unused data before we dropped it. Reload memory has an impact on the performance of on-prem servers and for Qlik Cloud, reload memory is a direct cost.
Can we use less reload memory by not loading the unused fields and tables? It may be that a field is used the load script but is not used in the front end, so you have to examine your script to determine. This very clean script loads directly from QVDs so the answer is clear.
I’ll start by commenting (or removing) the tables we don’t need instead of dropping them at script end. To identify the unused tables I can refer to the previously generated script, the Flags, or return to the Script Generator tool in the Baseline analysis.
What about Tables with both used and unused fields? Can I easily load just the used fields and omit the unused fields? QSDA Script Generator to the rescue again. Open the Script Generator > Load Statement in the Baseline analysis and sort on the “Unused Bytes” column. Select the SalesOrderDetail table.
A Load statement fieldlist for [SalesOrderDetail] will be created with unused fields commented out. You can exclude the unused fields entirely by unchecking the “Include Unused Fields” box. I’ll start with just the SalesOrderDetail table for now. Note that the other tables with big Unused Fields savings, Notes and Customers, are already being excluded completely.
Before reloading, I’ll remove the AutoNumber and Drop statements I added earlier. Many of those drops are no longer valid as I am not loading the fields at all. Here is the updated load script.
Updated script to exclude unused fields
After reloading, I return to the QSDA App Card. Reload Mem is reduced from 693 MB to 268 MB! I’ll run another analysis to see where things stand.
I can address the remaining unused fields with DROP Fields or using the Load Statement method. I will add the AutoNumber statements back in as well. I hope the process is clear enough now so I won’t show these remaining steps in detail here.
Let’s turn our attention to another great value of QSDA — Performance improvement. The “Calc Time Distribution” chart plots the calculation time of each chart in the App. I see I have an outlier that takes about 4 seconds to calculate.
What is this chart and why is is taking to long to calculate? Flip over to the Viz page and sort by the Calc Time column. The viz in question is a Table named “Salesperson Scorecard” on sheet “Dashboard” and it takes 4.142 seconds to calculate. Expanding the row shows the chart Dimensions and Measures and also exposes the “Viz Deconstruction” icon
Viz Details
Clicking the Viz Deconstruct icon will calculate each chart Measure in turn and tell us what is taking so long to calc in this chart. We can see the total calc time is over four seconds and measures three and four take considerably longer than the first two. But why?
Let’s focus on this chart to get some insight into the cause and solution of the poor performance. Back on the Viz page, Right-click the Viz of interest and select “Filter to Selected”. This will filter our pages to only those objects — Dimensions, Expressions, etc — associated with this Viz.
Click over to the Expressions page and we’ll see that our long running expressions have flags. Flags are the advice system of QSDA, identifying errors, bad practices and performance issues. Our two second expression “Longest Deal” has a “Key Field used in Expression” flag.
Clicking the “i” icon next to the flag will open the Flag Details panel which contains a link to the help article for this flag.
In the help article we’re informed that using a key field in an expression can sometimes cause poor performance. The article goes on to advise using a non-key field instead, creating a new field if necessary. I’ll take this advice and create a new field in the SalesOrderHeader table:
Reload and change the chart Measure to use the new field.
Max(Aggr(Sum(LineSales), SalesOrderRecord))
Run a new analysis to check the calc time of the chart (and check we didn’t impact anything else!). In the summary we’ll see a significant decrease in total app calc time and we can drill into our chart to see what’s changed.
Whoa! That measure has gone from 2 seconds to 0.008 seconds.
But Rob, didn’t I just blow up my RAM by making a copy of this big field? Don’t guess, use QSDA to see exactly what the cost is.
Additional RAM of SalesOrderRecord field
The cost is 50 KB of RAM and no significant increase in Reload Mem. Was it worth it? I think so.
If you chose to put the new field on the SalesOrderDetail table instead of SalesOrderHeader, the QSDA results will quickly show you that this was a bad idea. RAM is much increased and performance not so good.
QSDA Pro gives you the tools to:
Reduce your App memory and compute requirements.
Focused advice and insight for improving your App.
Compare different solutions and measure the impact of change.
Track your progress and document the impact of your work.
I hope this post makes it clear why I say that QSDA Pro is “the tool that every Qlik developer needs”.
Do you have questions about QSDA Pro or want to schedule a demo for your team? Reach out to us.
If you’re attending Qlik Connect in Orlando, stop by the Motio booth on the show floor. I’ll be there (guy in the hat) to answer your questions and show you even more features of QSDA Pro that can help you create leaner, cleaner and faster Qlik Apps.
If you are a Qlik Cloud (SaaS) customer you may have seen this dreaded message during script reload. Ack! Ack!
Your quota may be different than 25Mb, but the problem remains the same. How do I modify this application to fit within my subscribed quota?
QSDA Pro V3.1 to the the rescue! QSDA analyzes your app and provides recommendations and easy to use tooling to drop unused data and optimize the remaining data, potentially greatly reducing the size of your Qlik App.
QSDA needs a successfully loaded app for analysis, so our first step is to reload the app using the “Limited load” feature of the Qlik script debugger.
Click the Debug button in the Qlik script editor, check the Limited load option, accept the default of 100 rows and click “Run” to initiate the reload. The reload will come in under your quota. When the reload completes, key Ctrl-s to persist the data.
You don’t have a business-ready app, but this is enough for QSDA to scan all charts, expressions and master items to determine what data is actually required.
In QSDA Pro we now select our Cloud connection and select the app of interest, in this case “BikeShop Sales2”. Click the “Open” button to prepare the app for analysis. I will also uncheck the “Collect Calctime” option because chart calculation times are not meaningful with our limited data. Description is optional, in this case I’ve entered “Limited Load”.
Click the “New Analysis” button to launch the QSDA Pro analysis. The analysis may take a minute or two to complete depending on the capacity of your tenant and the current Qlik Cloud load. When the analysis is complete press the “View” button to see the results.
In the Summary view we see there are 89 unused fields. Unused fields are fields that are not used in any charts, master items or expressions. These are fields that can be dropped at the end of script without impacting the user experience in the app. The actual storage numbers (KiB) are not meaningful because we only loaded 100 rows. The bar chart is useful in that is indicates about 2/3 of our data is unused. Hope!
QSDA provides a Script Generator in the Tools menu to generate a “DROP Fields …” script statement for unused fields. Select the script generator from the Tools menu.
In the Script Generator > Drop Fields tab select the top checkbox to select all recommended fields. Press the “Copy” button to copy the generated Drop Fields statement to your clipboard.
Paste the Drop Fields statement at the end of your Qlik Script and reload.
Reload successful! Victory!
A subsequent QSDA analysis shows the app size has been reduced from the problematic 37Mb to 15MB!
QSDA Pro can quickly and efficiently get your Qlik Apps below your Qlik Cloud quota by pruning unnecessary data. Don’t pay for more than you need.
In a follow up post I’ll walk through some additional QSDA features to help you reduce Qlik app footprint even further and improve chart response time to boot.
Download QSDA Pro and try it for yourself. Or reach out to us to learn more about license options or arrange a demo or POC for your team.
-Rob
Are you going to Qlik Connect? I’ll be at the Motio booth on the show floor ready to demo QSDA Pro or answer any questions you may have.Or just to say Hej 🙂
Summary: Set Modifier field names and Set Identifiers are not validated by Qlik syntax check. QSDA Pro 2.5 validates everything and raises an “Unrecognized Name” flag for invalid names.
QSDA Pro syntax checks all expressions in a Qlik App using the Qlik syntax checker. You’re familiar with the Qlik syntax checker. It’s the message at the bottom of the expression editor that tells you your expression is “OK”.
The syntax checker is also good at telling you when the expression has an error, although it’s not always specific about the problem.
QSDA Pro, using the Qlik API version of syntax checker, tells you what is wrong with the expression:
The Qlik syntax checker has a significant limitation. It does not validate field names used in sets. Whether used in the API or the editor, the syntax checker will not raise an error if the field “EmpPurchase” does not exist in the data model.
This lack of validation can be a serious problem if “EmpPurchase” is removed or renamed in the data model. The expression will still return a number but the exclusion of employee purchases will no longer be applied. That could be a very subtle but important fail.
Recognizing this limitation and it’s potential impact, QSDA Pro 2.5 validates all names used in sets and will raise a new Quality flag, “Unrecognized Name”, if the name is invalid.
Another place this validation is extremely useful is Chart Filters in Qlik SaaS. The chart will cheerily display that filter “Expression2 > 1” is being applied. Even though field “Expression2” no longer exists in the data model.
But QSDA knows.
Ever use a bookmark as a set identifier? And then the bookmark got deleted or renamed?
I’ve used simple examples to demonstrate, but where this validation really shines is in longer expressions where it may be easier to overlook that something has gone wrong.
What does QSDA stand for? Qlik Sense Document Analyzer. The combination of data model and sheets we know as an “App” in Qlik Sense was called a “Document” in QlikView. When I first created this tool in 2009 to help maintain QlikView, I called it “Document Analyzer”. When it came time to create a similar tool for Qlik Sense, I stuck with the “DA” name as the function and usefulness of “Document Analyzer” was well established in the Qlik community.
Gitoqlokis a free chrome plugin that allows you to use a git repository such as GitHub, GitLab, Azure DevOps, etc, to manage all the pieces of Qlik Sense app — Measures, Sheets, Charts, Load script and more.
The plugin operates seamlessly within the Qlik Sense authoring experience to provide the power of fine grained version control you expect with git. You’ll see a list of changes and commit those changes as you would expect in any software development project. Of course there is collision detection, branching and merging and all the goodness you expect from git.
Because the Gitoqlok team understands the visual nature of Qlik development, you can view diffs in a visual format. A slider lets you pick a commit for compare.
You can also use the same slider in a non-diff “time machine” mode to see how the application sheets looked at any point in time.
Gitoqlok comes with a lot of very useful Qlik dev features, such as the ability to import a chart or script snippet from app to another. Or deploy an entire app from one server to another!
Most Gitoqlok features are free to use. Premium features and support are available in a reasonably priced subscription.
If you are attending QlikWorld next week Drop by the Motio booth (#315) to see more. Motio is also presenting two breakout sessions showcasing Gitoqlok, including their integration with my QSDA Pro tool.
Summary: While looking into long expressions I noticed that the optional sort-weight argument has an impact on the distinctness of Concat(distinct…). Incorrect use of sort-weight can generate bloated expressions containing redundant code.
In my work tuning Qlik Apps I sometimes encounter very long expressions. An expression many thousands of characters long can be difficult to debug or comprehend the expression goal. To help in working with long expressions I’ve added an Expression histogram and an Expression Decoder feature to my QSDA Pro product. (These features are currently in beta, generally available in early Feb).
I’ve noted expressions of length greater than 50k across apps from different customers. What did these expressions have in common that made them so large?
They used the Concat() function in $() to dynamically generate a part of the expression.
They used the optional sort-weight argument of Concat() incorrectly.
They were much bigger than necessary — sometimes 100x — but the expanded expression worked as intended.
In the process of reviewing the expressions I learned something surprising. As a reminder here’s the syntax of the Concat function:
This means that the set of distinct values is the combinations of string and sort_weight (if used). Let me demonstrate with an example. Here’s a sample data table.
For the expression: Concat(Dim, ',') we receive output "a,b,c,c,c“.
Adding the DISTINCT keyword: Concat(DISTINCT Dim, ',') we now get “a,b,c“.
Adding a non-distinct sort-weight argument: Concat(DISTINCT Dim, ',', RecId) we now get "a,b,c,c,c” again. More output than I expected. It’s a distinct list of the combinations of Dim and RecId.
Adding a distinct sort-weight argument: Concat(DISTINCT Dim, ',', Weight) we now get "a,b,c“.
How about if we used an unlinked data island field for sort-weight? The Island field has two values.
Concat(DISTINCT Dim, ',', IslandField) returns "a,b,c,a,b,c“. Item count is the product of Dim * IslandField values. Remember this for later.
Ok, this is all very interesting but the behavior is super obvious and I would notice it if it came up in my App. What’s this got to do with ginormous expressions?
Developers sometimes use Concat along with Dollar Sign Expansion (DSE) to generate dynamic expression fragments. For example to ignore all fields from several tables in a set modifier:
Sometimes $(=Concat(...)) is used to build the list inside a Pick() or Match(). These type of expressions frequently have awkward syntax including lots of “& chr(39) &” type stuff. Inevitably the expression gets built by copying and modifying an expression from elsewhere in the App. An expression that contains a sort-weight. A sort-weight that doesn’t get removed. It may be an island field or a field that has a many to one relationship. The result is an expanded expression that works but is larger than it needs to be. No one notices (unless they use QSDA Pro) because it’s the expanded expression.
As a simple example, suppose the “ignore filter” expression above was supposed to generate something like "Quarter=,Month=,Year=“. If I inadvertently use a sort-weight field that has 100 distinct values the result will be repeated 100 times. The expression would still work but it would be 100x larger than necessary.
I recently found an example where Concat was used to generate an If() function from data (very clever) that should have had 15 branches. But an unrelated sort-weight field of 95 values resulted in 1425 branches! It “worked” but did a lot of unnecessary calculation.
If you are a solo developer or working in a small team you may never encounter this issue. But if you are a consultant or maintaining legacy Apps you may stumble across it. I’ve been playing with ways to flag this condition in QSDA Pro. QSDA already flags data island expressions . I’m testing creating a new flag specifically for Concat().
Qlik has implemented a new “Simplified Authoring” experience in the SaaS (Cloud) environment. It’s available as an optional preview now and will be implemented for all SaaS tenants in mid-August.
I won’t describe the new features as they are well described here. If you haven’t already, I recommend you read the post and then come back here for my comments.
I think this is really great work and a significant improvement to the chart authoring experience. It not only makes it easier for newbies to create charts, I prefer using this interface in most cases. What do I like about it?
Moving the data panel to the left and increasing the display density is much more productive and fluid. This change seems to signal the death of the “iPad first” design. I’m pretty happy about that. The ability to work with field values while in design mode is fantastic.
The addition of the Data Table is brilliant. It’s much more fluid than swapping tabs to use the Data Model Viewer or building a temp table on the canvas.
I didn’t think I would like the “smart grid” feature but I’ve become a fan. With additional elements like Data Table the working canvas becomes relatively smaller and I’ve found it simplest to just add my chart in the general location and move on. As a final step I can switch to advanced mode and fine tune the chart locations and sizes if necessary.
Very cool feature to me is the new Chart Filters. This filter is implemented as a set expression applied to the data before expression set analysis is applied. So you now have three layers of data filtering — User Selections, Chart Filter, Expression (Measure) set analysis.
The chart filter is requested by a set expression in hypercube property qHyperCubeDef.qContextSetExpression . So you can use it in the Engine API if the engine level supports it. Potentially we could see Chart Filters available in a future QSEoW release. I think it would be useful to have Chart Filters as a master item for reusability and consistency.
If you are using QSDA Pro to evaluate your Qlik Sense apps (why not!), QSDA Pro release 2.1 supports capturing and parsing the new Chart Filters.
Want to up your Qlik Development game? Register for the Masters Summit for Qlik — Madrid 19 Sept or New Orleans 14 Nov — three days of advanced Qlik Developer training and tips.
Can I use an expression as the “fieldname” in a Set Modifier? In “<x={y}>” can “x” be an expression?
I always believed the answer to be “No”, x may only be a field name. That understanding is reinforced by what I read in the help doc for “set modifier”.
Yesterday I was surprised to discover that in Qlik Sense, this is a valid expression that returns sales for year 2015.
sum({<"=Year(OrderDate)"={2015}>}Sales)
This also works:
sum({<"=left(Country)"={'M'}>}Sales)
This is all news to me. And very interesting.
I stumbled across this accidentally when using the Expression Editor > Set Analysis > Insert button. I had selections in the derived field OrderDate.autocalendar.Year field. The set generated by the Insert tool was:
That expression is the derived field definition that was specified in the script.
I have not yet formulated an opinion as to whether this is useful, or if there are any cautions or limitations when using. I’m at the curious stage at this point and will look into it more as well as read comments I am sure will follow.
Summary: In this post I discuss when you may want to segment or “shard” QVD files and demonstrate some script patterns for producing and consuming segmented QVDs.
I recently received a question from a colleague who knew that I had done a lot of work with Qlik QVD files. He asked, “what’s the optimal size for a QVD file?”
My response was that in my experience there is no optimal physical size, but in many cases there are reasons to break a large QVD into multiple, smaller files. Dividing a large file into smaller files is called “segmenting” or “sharding”.
People generally start to think about segmenting QVDs when they encounter resource constraints while updating a QVD with an incremental load. In an incremental load, only new or changed rows are extracted from the database and merged with an existing QVD. This involves reading and writing the entire large QVD which can use significant time and I/O resources. This also means that the process takes increasingly longer as time marches on. Not pleasant.
Other reasons you may want to segment are consumption and archival patterns. It’s common to use the “latest n periods” of data in your active dashboard, for example the current 12 months. If the data is in a single large QVD, you have to roll off (delete) data older than 12 months. You can filter as you load, but that becomes an increasingly wasteful activity over time.
It’s likely that you will want to retain the older data for use in special projects or historical dashboards.
Given the example scenario above, it would make sense to create one QVD for each month. This will provide predictable performance for incremental updates as well as the dashboard load. Older data could be kept forever and any set of months could be accessed efficiently.
How do we do perform this magic segmentation? Let’s assume an example QVD with these fields:
TransactionID: Primary key TransactionDate: Original date of transaction LastUpdate: Timestamp of last update to this row. Transactions may receive updates up to 60 days after creation. other…: other fields such as Amount, Customer, etc
We want to create one QVD per month using the name “Transactions-YYYY-MM.qvd”. What determines which QVD a transaction is placed in? Is it the MonthStart(TransactionDate)? It depends…
The simplest technique is for the extract script to place everything loaded today into the current month QVD, regardless of the TransactionDate. The QVD name is assigned to a variable in the script using:
Let vQvdName = 'Transactions-' & Date(Today(1),'YYYY-MM') & '.qvd';
When later loading 12 QVDs into the dashboard, load front (most current QVD) to back with the clause:
Where not Exists(TransactionID)
The Where clause will ensure that only the most current row for that TransactionID will be loaded.
This simple technique might be ok for most scenarios. But it’s not very robust because it falls down when you want to do something like a full reload to add columns, or data is loaded off schedule. It also would require that if want to load something like 6 months from the middle, we have to be careful to include enough later QVDs to cover possible updates.
A more robust approach would be to store each transaction row in the QVD corresponding with it’s TransactionDate. Here is one script pattern to do just that. Our starting point for this script is that we have already extracted the new and changed rows to create table “Transactions”.
Step #1 is to collect the month values into a temp table:
TempMonths: LOAD Distinct MonthStart(TransactionDate) as TranMonth Resident Transactions;
Next we process each TranMonth value in a loop block. The block will build a temp table of rows for just one month and merge with any existing QVD.
For i = 1 to FieldValueCount('TranMonth') Let vMonthName = Date(FieldValue('TranMonth', $(i)), 'YYYY-MM'); Set vQvdName = Transactions-$(vMonthName).qvd; MonthTransactions: NoConcatenate LOAD * Resident Transactions Where MonthStart(TransactionDate) = FieldValue('TranMonth', $(i)); If FileSize('$(vQvdName)') > 0 THEN // If we have existing QVD LOAD * From [$(vQvdName)] (qvd) Where Not Exists(TransactionID); ENDIF Store MonthTransactions into [$(vQvdName)] (qvd); Drop Table MonthTransactions; Next i
Drop Table TempMonths, Transactions;
The above segmenting script would support incremental reload, full reload or a load of any data in between.
So now we have many “Transactions-YYYY-MM.qvd” files. How do we load the current 12 months? Do we wake up early on the 1st of each month and quick change the script? No. We create a dynamic script based off the current day.
For i = 0 to -11 step -1 // 12 Months Let vMonthName = Date(AddMonths(Today(1), $(i)), 'YYYY-MM'); Transactions: LOAD * From [Transactions-$(vMonthName).qvd] (qvd); Next i
If we had built the QVDs using any technique that allowed for the possibility of duplicate TransactionID, we would add a guard of “Where not Exists()”.
... From [Transactions-$(vMonthName).qvd] (qvd) Where not Exists(TransactionID);
What About IntraDay High Volume Reloads?
In scenarios with Intraday loading and high transaction counts, I prefer to defer merging QVDs to off-peak times.
Let’s take an example scenario of a customer who generates approximately 10 million transactions per day, with peak hours creating about 2 million transactions. The main dashboard should be refreshed hourly for twelve hours each day and should contain the last 10 days of transactions. Of course all data should be kept around for various summary analyses and ad-hoc projects.
It makes sense to segment these QVDs by day. Our hourly incremental load will need to merge with — read and write — a fairly large daily QVD. Crucially, the load time gets longer as the day progresses and the QVD gets larger. And now I hear rumors of twice hourly reload. This pattern has a bad smell.
What to do? Let’s store the hourly incremental rows in a hourly QVD of their own. The dashboard will then pick up all hourly QVDs plus required daily QVDs. Overnight, when we have some breathing room, we will run a script to consolidate the hourlies into a daily QVD.
The hourly incremental QVD is created like:
Let vQvdName = 'Hourly-Transactions-' & Timestamp(Now(1), 'YYYY-MM-DD-hh-mm-ss') & '.qvd'; Store Transactions into [$(vQvdName)] (qvd);
Then the dashboard will load the new data using a wildcard load for the Hourly QVDs and a loop for the prior days:
// Load all Hourly QVDs Load * From [Hourly-Transactions-*.qvd] (qvd); // Load previous 9 days of Daily QVDs For i = 1 to 9 // 9 Days Let vDateName = Date((Today(1) -$(i)), 'YYYY-MM-DD'); Transactions: LOAD * From [Transactions-$(vDateName).qvd] (qvd); Next i
Getting LastUpdate From a QVD
One of the steps in incremental loading is determining what “zzz” value to use in the SQL “Where LastUpdate >= zzz”. We need the high value from the last load. Some people store this information in a side file or variable. I think the most reliable approach is to get the high value from the existing QVD.
Getting Max(LastUpdate) from a very large QVD can take some time (how to do this the quickest is always an interesting pub question). My preferred technique is to store a new field “MaxLastUpdate” in the QVD and then read only the first row of the QVD to retrieve the value.
Getting and Joining Max(LastUpdate) should be fairly quick because we are only dealing with the incremental rows.
Transactions: SQL Select * From db.transactions where LastUpdate >= foo; Left Join (Transactions) Load Max(LastUpdate) as MaxLastUpdate Resident Transactions;
The lastest MaxLastUpdate value can then be retrieved by reading only the first row of the existing QVD. Here’s how it looks all together using the example of monthly QVDs.
Let vMonthName = Date(Today(1), 'YYYY-MM'); TempMonth: First 1 Load MaxLastUpdate From [Transactions-$(vMonthName).qvd] (qvd); Let vMaxLastUpdate = TimeStamp(Peek('MaxLastUpdate'), 'MM-DD-YYYY hh:mm:ss'); Drop Table TempMonth;
Transactions: SQL Select * From db.transactiions where LastUpdate >= '$(vMaxLastUpdate)';
Left Join (Transactions) Load Max(LastUpdate) as MaxLastUpdate Resident Transactions;
// Merge or segment with existing QVDs
I hope you found some useful tips in this article. No doubt you have some ideas of your own, feel free to add comments.
Want to learn more advanced scripting techniques? After 2 years of virtual sessions, the Masters Summit for Qlik is back with live events this fall. In September we’ll be in Madrid, Spain, and in November we’ll be in New Orleans, USA. If you want to take your Qlik skills to the next level, get access to all sorts of ready-to-use solutions and reference materials, share experiences and maybe a few horror stories with your peers then we hope to see you there!