Error in write data


#1

Dear Experts,

I was using HDF5 to write a very large dataset (about 1Tb) and encountered the following error. Note that the same code with the same compilation runs totally fine for a smaller test (~1 Gb data). So I suspect it is a problem with the memory, but the program didn’t report any memory issue of the hdf5_dataspace_memory.
Any suggestions or help is much appreciated.

Error message:
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 0:
#000: H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: H5Dio.c line 798 in H5D__write(): unable to initialize storage
major: Dataset
minor: Unable to initialize object
#002: H5Dint.c line 2245 in H5D__alloc_storage(): unable to initialize contiguous storage
major: Low-level I/O
minor: Unable to initialize object
#003: H5Dcontig.c line 173 in H5D__contig_alloc(): unable to reserve file space
major: Low-level I/O
minor: No space available for allocation
#004: H5MF.c line 851 in H5MF_alloc(): allocation failed from aggr/vfd
major: Resource unavailable
minor: Can’t allocate space
#005: H5MFaggr.c line 124 in H5MF_aggr_vfd_alloc(): can’t allocate raw data
major: Resource unavailable
minor: Can’t allocate space
#006: H5MFaggr.c line 221 in H5MF__aggr_alloc(): ‘normal’ file space allocation request will overlap into ‘temporary’ file space
major: Resource unavailable
minor: Out of range


Code :

start[0] = pcsum;
start[1] = 0;

count[0] = pc;
count[1] = get_values_per_blockelement(blocknr);
pcsum += pc;

H5Sselect_hyperslab(hdf5_dataspace_in_file, H5S_SELECT_SET,
                    start, NULL, count, NULL);

dims[0] = pc;
dims[1] = get_values_per_blockelement(blocknr);

if ((hdf5_dataspace_memory = H5Screate_simple(rank, dims, NULL) < 0))
   {
    printf("failed to allocate memory for `hdf5 data space memory' (dims[0]: %lld x dims[1] %lld ~ %g MB).\n", dims[0], dims[1],dims[0]* dims[1]*4 / (1024.0 * 1024.0));
    report_memory_usage(&HighMark_run, "RUN");
    endrun(1238);
   }

hdf5_status =
H5Dwrite(hdf5_dataset, hdf5_datatype,
         hdf5_dataspace_memory,
         hdf5_dataspace_in_file, H5P_DEFAULT, CommBuffer);

H5Sclose(hdf5_dataspace_memory);

Weiguang


#2

Hi,

Can you check if you have enough disk space to write data? H5Screate_simple call doesn’t allocate space on the file system.

Thank you!

Elena


#3

Hi
Sorry for my late reply.
Yes, I can confirm that there is enough disk and memory space for the data. However, I am not sure if there is any system limitations on the maximum file size. I am using the HPC server, so I have no idea how to check that.
Right now, I installed HDF 1.10.6, and the error changed to this:

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
#000: H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: H5Dio.c line 820 in H5D__write(): can’t write data
major: Dataset
minor: Write failed
#002: H5Dcontig.c line 658 in H5D__contig_write(): contiguous write failed
major: Dataset
minor: Write failed
#003: H5Dselect.c line 314 in H5D__select_write(): write error
major: Dataspace
minor: Write failed
#004: H5Dselect.c line 225 in H5D__select_io(): write error
major: Dataspace
minor: Write failed
#005: H5Dcontig.c line 1280 in H5D__contig_writevv(): can’t perform vectorized sieve buffer write
major: Dataset
minor: Can’t operate on object
#006: H5VM.c line 1500 in H5VM_opvv(): can’t perform operation
major: Dataset
minor: Write failed
#007: H5Dcontig.c line 1028 in H5D__contig_writevv_sieve_cb(): block write failed
major: Dataset
minor: Write failed
#008: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#009: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#010: H5Faccum.c line 826 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#011: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#012: H5FDsec2.c line 829 in H5FD_sec2_write(): file write failed: time = Wed Jun 3 04:31:11 2020
, filename = ‘./snaps//snapdir_009/snap_009.0.hdf5’, file descriptor = 24, errno = 27, error message = ‘File too large’, buf = 0x2b283d935058, total write size = 14043372, bytes this sub-write = 14043372, bytes actually written = 18446744073709551615, offset = 0

It seems that the code can not write an array of ~13Gb. I am pretty sure there is something wrong with the HPC, because the same code runs fine on another machine and can write 200Gb data into a file. But any suggestions on what and how to check the problem is welcome.
Many thanks.
Regards,
Weiguang


#4

It is very strange…
When I compile my program with -lhdf5 -ldl -lz, I got the problem of

#012: H5FDsec2.c line 829 in H5FD_sec2_write(): file write failed: time = Wed Jun 3 04:31:11 2020, filename = ‘./snaps//snapdir_009/snap_009.0.hdf5’, file descriptor = 24, errno = 27, error message = ‘File too large’, buf = 0x2b283d935058, total write size = 14043372, bytes this sub-write = 14043372, bytes actually written = 18446744073709551615, offset = 0

While with only -lhdf5, I get back to the original error:

#006: H5MFaggr.c line 221 in H5MF__aggr_alloc(): ‘normal’ file space allocation request will overlap into ‘temporary’ file space

With either choice (and it doesn’t matter how many HDF5 files I split to save the data), the failed output file is always 3.3 Gb in total. However, I wrote a simple HDF5 test file, which can save over 20 Gb data without a problem… So I suppose this is something that connects with MPI as well…

Many thanks for your suggestions.
Regards,
Weiguang


#5

Just an naive question, my code is compile with MPI+openMP and with -DH5_USE_16_API. Will this hybrid compiling confuse HDF5 to cause such problem?
Thank you.