SWMR R/W in two processes: good, in same process: bad

guyholford · November 10, 2022, 7:11pm

Exploring the robustness of using SWMR to write and read and display a realtime-ish stream of data. In c/c# on Windows.

If I have one app writing and another reading, it’s working as expected.

If I have both operations in one app, it works then fails at a random time with e,g, System.AccessViolationException: ‘Attempted to read or write protected memory’ when reading.

If there are any ideas, I’d be grateful.

guyholford · November 10, 2022, 7:47pm

Oh I think it’s fixed - I had the read and write on different threads, mea culpa.

Still, writing down the question is a great way to see the solution. And also I read a lot more documentation along the way, so, good

derobins · November 10, 2022, 7:56pm

SWMR is about multiple processes, btw - not multiple threads. Different state issues.

guyholford · November 10, 2022, 8:18pm

Got it, thanks. Key thing for me is being able to write and read from the same process.

Currently, I’m opening the H5 file for writing, and opening again for reading so I have two file IDs. It works but is that a valid pattern, or should I write and read using the one file ID?

derobins · November 10, 2022, 10:00pm

When you open a file multiple times in HDF5, we try to determine if the file has already been opened and simply return a new ID for the already opened file if so. Under the hood, the file will only have been opened once, so they will share a metadata cache and there will be no SWMR. This is true in a single process, whether you are using one thread or many. Multiple processes do not share state and thus can’t use this mechanism. In this case, each process will have it’s own metadata cache and sense of file state, hence SWMR is needed if one of those processes is a writer.

derobins · November 10, 2022, 10:01pm

The underlying file structure that we maintain in the library is reference counted, by the way. You can create and close IDs for it in any order and not make a mess.

guyholford · November 10, 2022, 10:24pm

Ah, that’s interesting thank you. Re the counting I thought I was going to have to keep track of which files were open for writing and be sure not to close them after reading, but this is good functionality.

Does it mean that I can happily read and write a file from the one process and not have to open it in SWMR at all? If so that would be great for what we need to begin with. I’ve just tried it and appears to work.

Even so, I’m glad to have delved into SWMR I bet we’ll have a good use for it.

derobins · November 10, 2022, 10:48pm

Yes, from one process you do not need SWMR, even if you open the file multiple times and access it through multiple file IDs.

guyholford · November 11, 2022, 8:05am

Thank you for the help here. I’m very keen to go with the grain, to avoid gotchas that only emerge later. So these extra insights are super helpful.

So can I just ask one more thing, is this a legit design as far as H5 is concerned? A single app/process that…

Creates an H5 file in non-SWMR mode
Reads and writes from the same thread (with no need for interlocks between the two operations)
Writes new data to datasets
Creates groups and datasets on the fly

Currently this is working fine, I had it on a soak test all night. The resulting file opened fine in HDFViewer and some python scripts, with all the groups and datasets there as expected.

But if it’s only working by accident, so to speak, it would be helpful to understand that.

gheber · November 11, 2022, 11:58am

That sounds like “HDF5 in action,” helping to solve another problem and doing good things.

G.

guyholford · November 11, 2022, 12:56pm

That’s a ‘yes’ I think Fantastic - onwards and upwards.

I’m feeling very positive about HDF5 at the moment. Thank you to the team.

hyoklee · November 11, 2022, 2:51pm

Have you tested SWMR with parallel option?
Don’t blame me for hurting your feeling!

guyholford · November 11, 2022, 3:24pm

Parallel - no I’ve not yet. I need to learn to walk before I can run

derobins · November 11, 2022, 4:22pm

SWMR is not supported with parallel HDF5. It will work fine with the parallel-enabled library, but we don’t test it with the MPI-I/O VFD and parallel HDF5 so it’s officially unsupported.

hyoklee · November 11, 2022, 4:25pm

Would you please make configure or cmake fail automatically when a user enables them both?

derobins · November 11, 2022, 4:27pm

SWMR isn’t a configure option

hyoklee · November 11, 2022, 4:34pm

Right! My mistake! I was confused with --enable-*-vfd options. I thought one of them was --enable-swmr-vfd.

BTW, why not configuration option? Disable SWMR by default and enable it only when a user asks.

derobins · November 11, 2022, 4:57pm

I’m not a fan of configure options as it makes the library configuration and testing space exponentially more complicated. Using them as binary “build this or don’t” flags is fine (e.g., Fortran, Java wrappers), but when you change library behavior with configuration options, you get 2^n different libraries to test.