Object timestamps - useful or not?


#1

I’ve opened a question of whether h5py should disable writing object timestamps by default.

There are more details there, but to summarise: There’s a clear, albeit niche, use case where you don’t want timestamps, if you want to be able to generate byte-for-byte equal HDF5 files. But it’s not obvious how to get any timestamps from h5py, or with the HDF5 command line tools (h5ls, h5dump, h5stat). And the docs for H5Oget_info3 say that only ctime is implemented anyway. And, going by discussions here and on h5py, it looks like no-one cares much about these limitations - maybe because timestamps aren’t used much.

But maybe I’m missing something? :slightly_smiling_face: If there’s some use case where the timestamps that are implemented are useful, and the existing ways to access them are sufficient, I’d like to know, so we can weigh that up against the use case where we know timestamps are a nuisance.


#2

@thomas1 hope you are well!? We met virtually on the last HUG event, I enjoyed your presentation! IMHO this is a very specific feature, that one would possibly take on a day trip on few special occasions, but in most cases would just take up room in the rucksack – so to speak.

Truth to be told, I am not major consumer for python, my use cases are always across platforms: write in one language (c++) consume from other: h5py, julia, matlab, R, … And my point having niche features with cost associated have less values. By cost I mean maintenance, performance, … .
Personally when I use timestamp, I add it directly as an attribute, or a field in the dataset, to signal intent.

best wishes: steven


#3

Thanks Steven, that’s useful to know.

I should have said explicitly that I’m interested in use cases no matter what tools & languages you use to work with HDF5, because files written by h5py may well be read by other tools. This is why I didn’t post in the h5py category. :slightly_smiling_face:


#4

Maybe a bit late to the discussion, but I’ll add that I took a similar decision to disable object timestamps as the default when using rhdf5. I had several users grumble that different md5 sums were generated on “identical” files which confused some part of their workflow, and no one gave me a the counter example you’re looking for either.


#5

Thanks, that’s useful to know (and not too late :slight_smile: ). Can I ask how long ago you made that change and whether you’ve had any complaints since? I think that we’re most likely to find out about use cases from people complaining after we’ve changed the default, so if someone else has already tried the experiment…


#6

It looks like I made the change in March 2020 (https://github.com/grimbough/rhdf5/commit/50a79d3a3f98ccc1f950f725cf3961b416848bcd), which means it’s been in the wider release of rhdf5 since May 2020.

I completely agree that in my experience you’re more likely to find any dissenting voices after you’ve made the change, but so far no one’s complained about this particular change.


#7

Should we disable writing time stamps by default in our next major release of HDF5? How much incompatibility this may create?