Error when creating a simple dataset with chunks


#1

Hi there,

I am seeing this error when trying to crate a dataset with chunks of a particular size. The minimum code to reproduce is pretty simple:

#include "hdf5.h"
#include "stdlib.h"

int main (int argc, char **argv)
{
    MPI_Init(&argc, &argv);
 
    // Set up file access property list with parallel I/O access
    hid_t plist_id = H5Pcreate(H5P_FILE_ACCESS);
    H5Pset_fapl_mpio(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL);

    // Create a new file collectively and release property list identifier
    hid_t file_id = H5Fcreate("out.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
    H5Pclose(plist_id);

    // Create the dataspace for the dataset.
    hsize_t dimsf[3] = {782, 590, 768};
    hid_t filespace = H5Screate_simple(3, dimsf, NULL); 

    // Create the dataset creation property list, and set the chunk size
    hsize_t chunk[3] = {256, 256, 256};
    hid_t dcpl = H5Pcreate (H5P_DATASET_CREATE);
    herr_t status = H5Pset_chunk (dcpl, 3, chunk);

    // Create the chunked dataset.
    hid_t dset_id = H5Dcreate (file_id, "dset", H5T_STD_I32LE, filespace, H5P_DEFAULT, dcpl, H5P_DEFAULT);

    MPI_Finalize();
    return 0;
}

The code above fails at the dataset creation call with the following error:

> ./a.out 
mca_fbtl_posix_pwritev: error in writev:File too large
mca_fbtl_posix_pwritev: error in writev:File too large
mca_fbtl_posix_pwritev: error in writev:File too large
mca_fbtl_posix_pwritev: error in writev:File too large
mca_fbtl_posix_pwritev: error in writev:File too large
HDF5-DIAG: Error detected in HDF5 (1.10.5) MPI-process 0:
  #000: H5D.c line 145 in H5Dcreate2(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line 329 in H5D__create_named(): unable to create and link to dataset
    major: Dataset
    minor: Unable to initialize object
  #002: H5L.c line 1557 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1798 in H5L__create_real(): can't insert link
    major: Links
    minor: Unable to insert object
  #004: H5Gtraverse.c line 851 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 627 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1604 in H5L__link_cb(): unable to create object
    major: Links
    minor: Unable to initialize object
  #007: H5Oint.c line 2453 in H5O_obj_create(): unable to open object
    major: Object header
    minor: Can't open object
  #008: H5Doh.c line 300 in H5O__dset_create(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #009: H5Dint.c line 1278 in H5D__create(): can't update the metadata cache
    major: Dataset
    minor: Unable to initialize object
  #010: H5Dint.c line 977 in H5D__update_oh_info(): unable to update layout/pline/efl header message
    major: Dataset
    minor: Unable to initialize object
  #011: H5Dlayout.c line 508 in H5D__layout_oh_create(): unable to initialize storage
    major: Dataset
    minor: Unable to initialize object
  #012: H5Dint.c line 2335 in H5D__alloc_storage(): unable to initialize dataset with fill value
    major: Dataset
    minor: Unable to initialize object
  #013: H5Dint.c line 2422 in H5D__init_storage(): unable to allocate all chunks of dataset
    major: Dataset
    minor: Unable to initialize object
  #014: H5Dchunk.c line 4402 in H5D__chunk_allocate(): unable to write raw data to file
    major: Low-level I/O
    minor: Write failed
  #015: H5Dchunk.c line 4727 in H5D__chunk_collective_fill(): unable to write raw data to file
    major: Low-level I/O
    minor: Write failed
  #016: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
    major: Low-level I/O
    minor: Write failed
  #017: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
    major: Page Buffering
    minor: Write failed
  #018: H5Faccum.c line 826 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #019: H5FDint.c line 258 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #020: H5FDmpio.c line 1876 in H5FD_mpio_write(): file write failed
    major: Low-level I/O
    minor: Write failed

Clearly the “file too large” message doesn’t make sense, because my dataset is not that big. Also, I noticed that slightly increasing or decreasing the chunk size can make it work.

I built with gcc 7.5.0 and linked against Open MPI 3.1.3 on Ubuntu 18.04.

Any suggestion or feedback will be mostly welcome. Thanks!


#2

How about closing all those handles? (This is C.)

    ...

    // Create the chunked dataset.
    hid_t dset_id = H5Dcreate (file_id, "dset", H5T_STD_I32LE, filespace, H5P_DEFAULT, dcpl, H5P_DEFAULT);

    H5Pclose(dcpl);
    H5Dclose(dset_id);
    H5Sclose(filespace);
    H5Fclose(file_id);

    MPI_Finalize();
    return 0;
}

I call this the Powell-rule: You acquire a handle, you own it! (see Follow these simple rules and stay out of trouble:)

G.


#3

Alternatively you could delegate all the chores to H5CPP templates and let RAII take care of resources?
You can always re-export your C++17 code with “C”, and call it from arbitrary language

#include <h5cpp/all>
#include "stdlib.h"

int main (int argc, char **argv) {
    MPI_Init(&argc, &argv);
    MPI_Info info  = MPI_INFO_NULL;
    MPI_Comm comm  = MPI_COMM_WORLD;
    
    auto fd = h5::create("out.h5",  H5F_ACC_TRUNC,
				h5::default_fcpl, h5::mpiio({MPI_COMM_WORLD, info}
    h5::ds_t ds = h5::create<float>(fd, "/dset", 
        h5::max_dims{16*256,16*256, H5S_UNLIMITED}, h5::chunk{256,256,256}, h5::collective );
    // `ds` data descriptor is binary compatible with `hid_t`, you may freely pass it to 
    // appropriate HDF5 CAPI routines; or just use H5CPP and make it even simpler  
    MPI_Finalize();
    return 0;
} 

Did I mention there is full support for major linear algebra libraries, and std::vector<T>, and of course typed memory pointers, or cast raw pointers?

best wishes: steven


#4

Could you please follow the suggestion to close all handles, switch to Open MPI version 4.0.0 or later and use HDF5 1.10.7? Early versions of Open MPI had some bugs that caused HDF5 to fail.

Thank you!

Elena


#5

My colleague just confirmed that your program works on our Linux 3.10.0 server with HDF5 1.10.5 and Open MPI 3.1.3. I.e. we need to look into your development environment.

Have you build HDF5 1.10.5 with Open MPI 3.1.3 and run all tests? Any failures? In any case, I would start with more recent versions of HDF5 and Open MPI.

Thank you!
Elena


#6

Thanks @gheber, I had those lines in my code, just removed them here to save space. However, they do not help with the issue, since the crash happens at H5Dcreate().


#7

Hi @steven, thanks for the suggestion. Our code base is already C++, though we use a different HDF5 wrapper. However, the dataset creation error is coming from the underlying library, so I made sure to reproduce it here with pure C code. Cheers!


#8

Thanks, @epourmal. We’ll try to upgrade MPI and use the latest HDF5 and see if it solves the problem. Cheers!


#9

Hi @sergio I only noticed @gheber message, didn’t quite have time to look through it; hence the cow-boyish shot from the hip. Keeping it brief, if I am not too impolite to ask: what do you like the most in the c++ library you are using?

steve


#10

Not Canada Dry, I suspect :face_with_raised_eyebrow:


#11

Hey @steven, no problem at all! As per your question, the attractive features in the library I’m using (HighFive) are (i) being written in modern generic C++, (ii) having an easy to use API, (iii) supporting parallel HDF5 writing, and (iv) being header-only. Now, to be honest, I did not do a whole lot of qualification or comparison, so I’m pretty sure there will be others out there that also fulfill those requirements.


#12

@sergio Thanks! Yeah those are important features… anything you would add or find lacking?


#13

@steven Not really, it has everything I needed so far. If I find something I can let you know.


#14

Hi @epourmal, I tried the above code with OpenMPI 4.1.1, and it works! I was already using HDF5 1.10.5. Thanks for the recommendation!