Hi,
I’m working with a simulation that uses an old version of HDF (1.8.21) and writes timeseries data in the following data chunks:
[1 timestep] x [# variables] x [# items]
This is the worst possible chunking layout for timeseries data access. There are simply too many reads when dealing with hundreds of thousands of timesteps.
However, I’m not sure what the alternatives are since I need real-time access while the simulation is progressing. So I don’t think caching thousands of timesteps before writing will work. How fast it takes for the simulation to write timestep data is non-determinant.
I’ve repacked the data when the simulation is complete but this is obviously not a valid solution for a consumer application, especially considering how slow repacking takes.
How does one actually optimally chunk timeseries data that is dynamic in nature?
Thanks.