Parallel hdf5 and writing region references

Dear All,

Apologies for cross-posting! If anyone has come across below issue, please
let me know.

I was trying to write compound datatype (containing region reference) in
parallel using collective operation. With h5dump I saw that the region
references were not created correctly. I looked into "Collective Calling
Requirements" document and then realised that writing region reference is
not supported in parallel i/o. I tried simple test example which gives an
error "*H5Dio.c line 664 in H5D__write(): Parallel IO does not support
writing region reference datatypes yet*".

But note that this error does't appear if I am writing a compound datatype
containing region reference. Is this allowed/possible? (i.e. first create
region reference and store it as the member of compound datatype and then
write compound datatype in parallel collectively).

The dataset that I am creating is for neuron cells which are vary diverse
in terms of sizes. Each mpi rank is writing variable-length data to dataset
and I thought region-reference would be helpful but now have the issue
while writing region references.

What are possible alternatives? I can think of:
- write <count,offset> pair to emulate region reference? (but lose
flexibility as other users are going to use tools/python libraries to read
datasets and in this case they have to use count+offset pair)
- another option could be to write datasets first and then single rank can
create and write region references. (each cell has thousands of
region_references and for large simulations this may not scale (?) )

Any other suggestions?

Thanks,
Pramod

Hi Pramod,

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of pramod kumbhar
Sent: Monday, November 23, 2015 9:25 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Parallel hdf5 and writing region references

Dear All,

Apologies for cross-posting! If anyone has come across below issue, please let me know.

I was trying to write compound datatype (containing region reference) in parallel using collective operation. With h5dump I saw that the region references were not created correctly. I looked into "Collective Calling Requirements" document and then realised that writing region reference is not supported in parallel i/o. I tried simple test example which gives an error "H5Dio.c line 664 in H5D__write(): Parallel IO does not support writing region reference datatypes yet".

But note that this error does't appear if I am writing a compound datatype containing region reference. Is this allowed/possible? (i.e. first create region reference and store it as the member of compound datatype and then write compound datatype in parallel collectively).

[msc] This isn’t possible to do, and the fact that it does not return the same failure as writing a region reference directly indicates that this is a bug in the library and we should address it. I created a ticket for this (HDFFV-9619<https://jira.hdfgroup.org/browse/HDFFV-9619>).

The dataset that I am creating is for neuron cells which are vary diverse in terms of sizes. Each mpi rank is writing variable-length data to dataset and I thought region-reference would be helpful but now have the issue while writing region references.

What are possible alternatives? I can think of:
- write <count,offset> pair to emulate region reference? (but lose flexibility as other users are going to use tools/python libraries to read datasets and in this case they have to use count+offset pair)
- another option could be to write datasets first and then single rank can create and write region references. (each cell has thousands of region_references and for large simulations this may not scale (?) )

[msc] Unfortunately variable length data & region reference are not supported in parallel. There haven’t been enough use cases to push support for them. I believe there is a way to add support, but does require some engineering effort to do so.
Other approaches you may try is one file per process approach, but not sure if you want to deal with a large number of files post data generation. Or you could access the file one process at a time in a round robin fashion (poor’s man parallel I/O), but again this might not be very scalable.

Maybe others who have experience working with region references and VL datatypes have other alternatives and can chime in. I haven’t dealt with use cases in parallel HDF5 that required VL datatypes.

Thanks,
Mohamad

Any other suggestions?

Thanks,
Pramod