Write to parallel HDF5 from MPI sub-communicators


#1

Hi HDF5 community,

I’ve created a parallel HDF5 file and a number of groups in parallel over MPI_COMM_WORLD. Post file and group creation, the file is closed and different groups are opened by different MPI sub communicators for dataset creation and data output.

This however results in a crash with the following back trace :

HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 12:
  #000: H5F.c line 627 in H5Fopen(): unable to open file
    major: File accessibility
    minor: Unable to open file
  #001: H5VLcallback.c line 3384 in H5VL_file_open(): open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 3350 in H5VL__file_open(): open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_file.c line 97 in H5VL__native_file_open(): unable to open file
    major: File accessibility
    minor: Unable to open file
  #004: H5Fint.c line 1952 in H5F_open(): unable to read superblock
    major: File accessibility
    minor: Read failed
  #005: H5Fsuper.c line 615 in H5F__super_read(): truncated file: eof = 13144, sblock->base_addr = 0, stored_eof = 541160
    major: File accessibility
    minor: File has been truncated

I’m guessing that I have to specify that I need enough space to fill in the groups when creating the groups the first time and then close it so as to prevent the above crash (which I think is due to inadequate space to write to from different MPI sub comms simultaneously). If that is true, how does one indicate the amount of space each group requires ?

Here’s the output from h5debug should that help :

[sajid@xrm-backup 256x64]$ h5debug sol.h5
Reading signature at address 0 (rel)
File Super Block...
File name (as opened):                             sol.h5
File name (after resolving symlinks):              sol.h5
File access flags                                  0x00000000
File open reference count:                         1
Address of super block:                            0 (abs)
Size of userblock:                                 0 bytes
Superblock version number:                         0
Free list version number:                          0
Root group symbol table entry version number:      0
Shared header version number:                      0
Size of file offsets (haddr_t type):               8 bytes
Size of file lengths (hsize_t type):               8 bytes
Symbol table leaf node 1/2 rank:                   4
Symbol table internal node 1/2 rank:               16
Indexed storage internal node 1/2 rank:            32
File status flags:                                 0x00
Superblock extension address:                      18446744073709551615 (rel)
Shared object header message table address:        18446744073709551615 (rel)
Shared object header message version number:       0
Number of shared object header message indexes:    0
Address of driver information block:               18446744073709551615 (rel)
Root group symbol table entry:
   Name offset into private heap:                  0
   Object header address:                          96
   Cache info type:                                Symbol Table
   Cached entry information:
      B-tree address:                              136
      Heap address:                                680
[sajid@xrm-backup 256x64]$

I’ve recently come across Virtual Datasets feature and I think that is a simpler route to achieving the same end goal (combining multiple datasets from different files instead of writing to the same file) and might be easier if the above approach too complicated.

Thanks in advance!