Summary: A script pattern to wildcard load from multiple files when the column names vary and you want to harmonize the final fieldnames. Download example file here.
I’m sometimes wondering “what’s the use case for the script ALIAS statement?”. Here’s a useful example.
Imagine you have a number of text files to load; for example extract files from different regions. The files are similar but have slight differences in field name spelling. For example the US-English files use “Address” for a field name, the German file “Adresse” to represent the same field and the Spanish file “Dirección”.
We want to harmonize these different spellings so we have a single field in our final loaded table. While we could code up individual load statements with “as xxx” clause to handle the rename, that approach could be difficult to maintain with many variations. Ideally we want to load all files in a single load statement and describe any differences in a clear structure. That’s where ALIAS is useful. Before we load the files, use a set of ALIAS statements only for the fields we need to rename.
ALIAS Adresse as Address; ALIAS Dirección as Address; ALIAS Estado as Status;
The ALIAS will apply the equivalent “as” clause to those fields if found in a Load.
We can now load the files using wildcard “*” for both the fieldlist and the filename:
Clients: LOAD * FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq) ;
It’s magic I tell you!
What if the files have some extra fields picked up by “LOAD *” that we don’t want? It’s also possible that the files have different numbers of fields in which case automatic concatenation won’t work. We would get some number of “Client-n” tables which is incorrect.
First we will add the Concatenate keyword to force all files to be loaded into a single table. As the table doesn’t exist, the script will error with “table not found” unless we are clever. Here is my workaround for that problem.
Clients: LOAD 0 as DummyField AutoGenerate 0; Concatenate (Clients) LOAD * FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq) ; DROP Field DummyField;
Now let’s get rid of those extra fields we don’t want. First build a mapping list of the fields we want to keep.
MapFieldsToKeep: Mapping LOAD *, 1 Inline [ Fieldname Address Status Client ] ;
I’ll use a loop to DROP fields that are not in our “keep list”.
For idx = NoOfFields('Clients') to 1 step -1 let vFieldName = FieldName($(idx), 'Clients'); if not ApplyMap('MapFieldsToKeep', '$(vFieldName)', 0) THEN Drop Field [$(vFieldName)] From Clients; EndIf Next idx
The final “Clients” table contains only the fields we want, with consistent fieldnames.
Working examples of this code for both Qlik Sense and QlikView can be downloaded here.
I hope you find some useful tips in this post. Happy scripting.
12 thoughts on “Loading Varying Column Names”
Great tip Rob…
This is a great approach for many scenarios. The only downside or risk is if you suddenly later find that you have field name collisions in your source files. Depending on your environment this may or may not be an issue (e.g. if you have the ability to change the source files), or the files are not too complex.
For example, one source file calls a field called “Address1” which they want to use as the main address field to be loaded, and another file has a field called “Address” which should be loaded, and another field called “Address1” which should be ignored.
Not a major issue – but just something to keep in mind depending on the complexity of your environment and the amount of control you have over amending or changing the source files!
Where I’ve hit this issue, I just created a central control sheet to specify the files to load, and the “types” of the files to be loaded. The file “type” would be expecting a certain set of fields. Slightly less dynamic than the above approach, but works around scenarios where you have a large variety of files to load, and you don’t have the ability to change the structures. Also gives you the opportunity to optimise the loads if the files are very “wide” and are only interested in a few columns.
Thanks Rob, this is a useful simplification to scripting issue for same dataset from multiple sources I have.
I was wondering, could the ‘ALIAS’ element be maintained in a spreadsheet then loaded in rather than actually having to change the script every time a new alias pops up. It would mean the actual QVS could stay as is and all that would have to be changed and promoted would be an Excel mapping table which is a lot easier to maintain and manage?
My thanks in advance for any suggestions
Yes, the Alias’ could be maintained in an external file. In it’s simplest form it would be a text file that contains the “ALIAS x as y;” statements. Then pull the file in with a $(Must_Include).
If you wanted to store oldname/newname in an Excel file, you would load that and then use a loop to issue the ALIAS.
very clever to do the loop over the fields in “reverse” order!
As you probably noted, if you process the field numbers in ascending order, dropping fields changes the fieldnumber of the remaining fields. Processing in descending order avoids that problem.
Great post Rob. very useful. Thanks!
Is there a reason why the FieldValueList(‘Table’) is not used in this example to specify number of iterations and obtain the field names?
I’m not sure how you would use FieldValues() in this case. Can you give an example of what you are thinking?
Could you please explain how the “if not ApplyMap” works here?
It perhaps is a little hard to read in the blog format. The “If” through the “Then” should be on one line.
The loop goes through each field that has been loaded. The ApplyMap tests if the field is included in the “MapFieldsToKeep” table. If “not”, then the field is DROPped.
Thanks for sharing Rob!
I love your blog, you help me a lot!
Greetings from Argentina