DROP FIELD Does Not Release All Space

During the “Performance Tuning” session at the Barcelona Masters Summit, Oleg Troyansky demonstrated using Document Analyzer to identify unused fields followed by DROP FIELD statements to remove those unused fields from the data model. Roland Vecera offered an interesting discovery.  DROP FIELD after a BINARY LOAD does not free the expected amount of memory.

For Example:
Binary dropfieldstest_dm.qvw;

Roland has found that a subsequent LOAD RESIDENT of each affected table is required to fully reduce disk and RAM consumption to the expected level.

A field in a QVW is represented by three storage structures:
1. Symbol table, which stores the unique values of the field.
2. Record pointers, a pointer on each table row to the symbol value.
3. State space, where field selections are tracked.

Based on testing and calculation, my interpretation is that in this scenario (BINARY LOAD/DROP FIELD), the Symbols and State space is released. However, the space occupied by the Record pointers is not released, i.e. the records are not rewritten. This may be a significant amount of space, particularly when a table contains many rows.

For most developers this will be an obscure issue. But for people tuning large applications, this may provide an “aha”moment.

Thanks Roland!

5 thoughts on “DROP FIELD Does Not Release All Space”

  1. A very strange bug indeed, which can have a big impact, e.g. if one has a big all-including “data mart” QVW with other developers working via BINARY and dropping unused fields.
    As far as I understand, this is a bigger issue if the dropped fields are of high cardinality (?)

    Due to the fact, that LOAD RESIDENT can take a long time on a large fact table, here are some more ideas for a workaround:
    – File > Reduce Data > Keep Possible Values with an empty selection cleans up the unused space in memory and in QVW (needs to be done manually before saving though)
    – QV Publisher with a Reduce step works in a similar way, so that at least user documents on the QV Server are “cleaned up”. For example, one can reduce the document to all possible values in one of the fields in the fact table, e.g. a counter field, if no actual reduction of the data is needed. And unfortunately the clean up is not done through the Publisher without a Reduce step.

  2. Andrey,

    Thanks for the useful workarounds. Great ideas.

    Yes, high cardinality fields require more record bits for pointers, and so would have a bigger relative impact.

  3. I mean, a radical solution could be a QVD store and reload of all tables after the drop field(s) command. This could be automated, also after BINARY load, and will be usually faster than load Resident..

  4. Hi Vlad,

    The scenario given in the post was RAM consumption. I did not test saving the qvw and reopening or binary loading to see if the RAM dropped. That would be interesting to know.

Leave a Reply

Your email address will not be published. Required fields are marked *