Flat vs Nested Best Practices

I’m experimenting with HDF5 / h5py and looking for advice on how deep to nest my data.

My typical lab workflow is to collect data in a session, during which I’ll test a device in several configs. Typically I’ll repeat measurements for each config. If I use this natural division of data, I end up with very deep nesting:

/session/device/config/repetition/meas_type

This seems onerous to traverse, and also makes me wonder which level I’d attach to attrs to (always at the end, a mixture of levels?)

Any advice on this stuff? Flatter is better? Nested is better? I’m leaning towards some encoding, like:

/session/device-config/repetition/meas_type

…and always attaching attrs to the device-config level.

Any tips appreciated!

–John Brodie

“looking for advice on how deep to nest my data” …

Here is some generic advice that is not specific to HDF5. For data collection in general, I recommend many small separate files, rather than complex file structure and nesting. This allows well established system methods for file integrity, backup, cataloging, and performance. Design a file naming convention with appropriate hierarchical identification. If necessary for following applications or archiving, you can always aggregate the collection files into larger units at a later time. On the other hand, remember that utilities like tar and zip are excellent aggregators, in their own way.

I suggest at minimum, a separate file for each device in each session. Going all the way to a separate file for each repetition may or may not be overkill.

In general, I suggest attributes at multiple levels, on the level where they apply collectively. Put device-related attributes at the device level, config-related attributes at the config level, and so on. Try to standardize names of important attributes at each level, to aid future aggregation.

Thanks, this sounds like good advice. I’ll give it a shot.