Error in saving large data

Dear experts:

I encountered a problem with HDF5 for writing a very large data set into one HDF5 file. The code runs fine for a smaller test. I have checked with the memory, which seems fine. Any suggestions are welcome.
Many thanks.
Weiguang

The error information:
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 0:
#000: H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed
#001: H5Dio.c line 798 in H5D__write(): unable to initialize storage
major: Dataset
minor: Unable to initialize object
#002: H5Dint.c line 2245 in H5D__alloc_storage(): unable to initialize contiguous storage
major: Low-level I/O
minor: Unable to initialize object
#003: H5Dcontig.c line 173 in H5D__contig_alloc(): unable to reserve file space
major: Low-level I/O
minor: No space available for allocation
#004: H5MF.c line 851 in H5MF_alloc(): allocation failed from aggr/vfd
major: Resource unavailable
minor: Can’t allocate space
#005: H5MFaggr.c line 124 in H5MF_aggr_vfd_alloc(): can’t allocate raw data
major: Resource unavailable
minor: Can’t allocate space
#006: H5MFaggr.c line 221 in H5MF__aggr_alloc(): ‘normal’ file space allocation request will overlap into ‘temporary’ file space
major: Resource unavailable
minor: Out of range

The piece of the code:
start[0] = pcsum;
start[1] = 0;

                                    count[0] = pc;
                                    count[1] = get_values_per_blockelement(blocknr);
                                    pcsum += pc;

                                    H5Sselect_hyperslab(hdf5_dataspace_in_file, H5S_SELECT_SET,
                                                        start, NULL, count, NULL);

                                    dims[0] = pc;
                                    dims[1] = get_values_per_blockelement(blocknr);

                                    if ((hdf5_dataspace_memory = H5Screate_simple(rank, dims, NULL) < 0))
                                       {
                                        printf("failed to allocate memory for `hdf5 data space memory' (dims[0]: %lld x dims[1] %lld ~ %g MB).\n", dims[0], dims[1],dims[0]* dims[1]*4 / (1024.0 * 1024.0));
                                        report_memory_usage(&HighMark_run, "RUN");
                                        endrun(1238);
                                       }

                                    hdf5_status =
                                    H5Dwrite(hdf5_dataset, hdf5_datatype,
                                             hdf5_dataspace_memory,
                                             hdf5_dataspace_in_file, H5P_DEFAULT, CommBuffer);

                                    H5Sclose(hdf5_dataspace_memory);

Posting a Minimum Working Example could increase the number of responses, as partial code requires to write the missing part.

If C++ is an option for you here are some worked out examples with H5CPP for major linear algebra systems. Because of zero copy implementation you can load up as much data as the computing device memory allows.
Chunked/partial IO allows out of core processing…

best: steve

That would have to be a very, very large dataset - “temporary” address space starts at 8EiB. :slight_smile: As Steven Varga says, posting a reproducer would help.

Quincey
1 Like

Thanks for the suggestion. It is a very large program with a very complex dataset. It wouldn’t be easy to do a reproducer.
Is it possible to know what can cause the error “‘normal’ file space allocation request will overlap into ‘temporary’ file space”?

Can you share the size of the dataset: dimensions, datatype and what you want to do with it?

ps.: most software systems are large and complex – one part is to break it down / reduce it to the minimal set of lines that still represent the problem and not more – or define a small but identical simple problem that if solved will allow you solve the original, large problem…

When you don’t have time/expertise to do this break down you may turn to a consultant to the reduction phase for you – in return for the price you pay you get to enjoy your free time… However this is a community based forum, hence I am suggesting you to provide the Minimum Working Example that defines your problem.

In any event, I am curious of the size of the dataset, please let me know…
steve

I don’t think it is a problem of the size of dataset. Because I used a simple MPI test file (similar writing process on the same machine) to write ~30Gb data without a problem.

Just an naive question, my code is compile with MPI+openMP and with -DH5_USE_16_API. Will this hybrid compiling confuse HDF5 to cause such problem? But the same code and compiled in a similar way on another machine which didn’t report such error. That really puzzled me.

Thank you.

ps: the failed snapshots file is always about 3.3Gb, even I split the snapshot into small hdf5 files (total size is also ~3.3Gb).