While writing significant amount of data in parallel, I obtain the
following error stack:
HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66: #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
major: Dataset
minor: Unable to initialize object #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
link to dataset
major: Dataset
minor: Unable to initialize object #002: H5L.c line 1638 in H5L_link_object(): unable to create new
link to object
major: Links
minor: Unable to initialize object #003: H5L.c line 1882 in H5L_create_real(): can't insert link
major: Symbol table
minor: Unable to insert object #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
major: Object header
minor: Unable to initialize object #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
major: Object header
minor: Can't open object #008: H5Doh.c line 293 in H5O__dset_create(): unable to create dataset
major: Dataset
minor: Unable to initialize object #009: H5Dint.c line 1060 in H5D__create(): can't update the metadata cache
major: Dataset
minor: Unable to initialize object #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
layout/pline/efl header message
major: Dataset
minor: Unable to initialize object #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
initialize storage
major: Dataset
minor: Unable to initialize object #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
initialize dataset with fill value
major: Dataset
minor: Unable to initialize object #013: H5Dint.c line 1805 in H5D__init_storage(): unable to allocate
all chunks of dataset
major: Dataset
minor: Unable to initialize object #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to write
raw data to file
major: Low-level I/O
minor: Write failed #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
to write raw data to file
major: Low-level I/O
minor: Write failed #016: H5Fio.c line 171 in H5F_block_write(): write through metadata
accumulator failed
major: Low-level I/O
minor: Write failed #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed #018: H5FDint.c line 260 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): MPI_File_write_at_all failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
error stack:
ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
major: Internal error (too specific to document in detail)
minor: MPI Error String
It basically claims that I am creating a file too large. But I
verified that the filesystem is capable of handling such a size. In my
case, the file is around 4 TB when it crashes. Where could this
problem come from? I thought HDF5 does not have a problem with very
large files. Plus, I am dividing the file in several datasets, and the
write operations work perfectly until, at some point, it crashes with
the errors above.
Could it be an issue with HDF5? Or could it be an MPI limitation? I am
skeptic about the latter option: at the beginning, the program writes
several datasets inside the file succesfully (all the datasets being
the same size). If MPI was to blame, why wouldn't it crash at the
first write?
Hi Frederic,
Could you give us some more details about your file and the call(s) you are making to HDF5? I can’t think of any reason that it would crash when creating a file like this, but something interesting could be going on…
While writing significant amount of data in parallel, I obtain the
following error stack:
HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66: #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
major: Dataset
minor: Unable to initialize object #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
link to dataset
major: Dataset
minor: Unable to initialize object #002: H5L.c line 1638 in H5L_link_object(): unable to create new
link to object
major: Links
minor: Unable to initialize object #003: H5L.c line 1882 in H5L_create_real(): can't insert link
major: Symbol table
minor: Unable to insert object #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
major: Object header
minor: Unable to initialize object #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
major: Object header
minor: Can't open object #008: H5Doh.c line 293 in H5O__dset_create(): unable to create dataset
major: Dataset
minor: Unable to initialize object #009: H5Dint.c line 1060 in H5D__create(): can't update the metadata cache
major: Dataset
minor: Unable to initialize object #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
layout/pline/efl header message
major: Dataset
minor: Unable to initialize object #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
initialize storage
major: Dataset
minor: Unable to initialize object #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
initialize dataset with fill value
major: Dataset
minor: Unable to initialize object #013: H5Dint.c line 1805 in H5D__init_storage(): unable to allocate
all chunks of dataset
major: Dataset
minor: Unable to initialize object #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to write
raw data to file
major: Low-level I/O
minor: Write failed #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
to write raw data to file
major: Low-level I/O
minor: Write failed #016: H5Fio.c line 171 in H5F_block_write(): write through metadata
accumulator failed
major: Low-level I/O
minor: Write failed #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed #018: H5FDint.c line 260 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): MPI_File_write_at_all failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
error stack:
ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
major: Internal error (too specific to document in detail)
minor: MPI Error String
It basically claims that I am creating a file too large. But I
verified that the filesystem is capable of handling such a size. In my
case, the file is around 4 TB when it crashes. Where could this
problem come from? I thought HDF5 does not have a problem with very
large files. Plus, I am dividing the file in several datasets, and the
write operations work perfectly until, at some point, it crashes with
the errors above.
Could it be an issue with HDF5? Or could it be an MPI limitation? I am
skeptic about the latter option: at the beginning, the program writes
several datasets inside the file succesfully (all the datasets being
the same size). If MPI was to blame, why wouldn't it crash at the
first write?
Hi Frederic,
Could you give us some more details about your file and the
call(s) you are making to HDF5? I can’t think of any reason that it
would crash when creating a file like this, but something interesting
could be going on…
Depending on how new his MPI implementation is, it might not have all
the 64 bit cleanups in the NFS path.
The final error in the trace says "File too large" but what it might
mean is "I/O request too big".
If you write to something that is not NFS, I think you'll find this
problem goes away:
have a bit more information. I neglected NFS back then and did not
update that driver until earlier this year.
==rob
···
On Mon, 2017-08-07 at 09:14 -0500, Quincey Koziol wrote:
Quincey
> On Aug 7, 2017, at 5:28 AM, Frederic Perez <fredericperez1@gmail.co > > > wrote:
>
> Hi,
>
> While writing significant amount of data in parallel, I obtain the
> following error stack:
>
> HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66:
> #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
> major: Dataset
> minor: Unable to initialize object
> #001: H5Dint.c line 453 in H5D__create_named(): unable to create
> and
> link to dataset
> major: Dataset
> minor: Unable to initialize object
> #002: H5L.c line 1638 in H5L_link_object(): unable to create new
> link to object
> major: Links
> minor: Unable to initialize object
> #003: H5L.c line 1882 in H5L_create_real(): can't insert link
> major: Symbol table
> minor: Unable to insert object
> #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path
> traversal failed
> major: Symbol table
> minor: Object not found
> #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal
> operator failed
> major: Symbol table
> minor: Callback failed
> #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
> major: Object header
> minor: Unable to initialize object
> #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
> major: Object header
> minor: Can't open object
> #008: H5Doh.c line 293 in H5O__dset_create(): unable to create
> dataset
> major: Dataset
> minor: Unable to initialize object
> #009: H5Dint.c line 1060 in H5D__create(): can't update the
> metadata cache
> major: Dataset
> minor: Unable to initialize object
> #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
> layout/pline/efl header message
> major: Dataset
> minor: Unable to initialize object
> #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
> initialize storage
> major: Dataset
> minor: Unable to initialize object
> #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
> initialize dataset with fill value
> major: Dataset
> minor: Unable to initialize object
> #013: H5Dint.c line 1805 in H5D__init_storage(): unable to
> allocate
> all chunks of dataset
> major: Dataset
> minor: Unable to initialize object
> #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to
> write
> raw data to file
> major: Low-level I/O
> minor: Write failed
> #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
> to write raw data to file
> major: Low-level I/O
> minor: Write failed
> #016: H5Fio.c line 171 in H5F_block_write(): write through
> metadata
> accumulator failed
> major: Low-level I/O
> minor: Write failed
> #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
> major: Low-level I/O
> minor: Write failed
> #018: H5FDint.c line 260 in H5FD_write(): driver write request
> failed
> major: Virtual File Layer
> minor: Write failed
> #019: H5FDmpio.c line 1846 in H5FD_mpio_write():
> MPI_File_write_at_all failed
> major: Internal error (too specific to document in detail)
> minor: Some MPI function failed
> #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
> error stack:
> ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
> major: Internal error (too specific to document in detail)
> minor: MPI Error String
>
>
> It basically claims that I am creating a file too large. But I
> verified that the filesystem is capable of handling such a size. In
> my
> case, the file is around 4 TB when it crashes. Where could this
> problem come from? I thought HDF5 does not have a problem with very
> large files. Plus, I am dividing the file in several datasets, and
> the
> write operations work perfectly until, at some point, it crashes
> with
> the errors above.
>
> Could it be an issue with HDF5? Or could it be an MPI limitation? I
> am
> skeptic about the latter option: at the beginning, the program
> writes
> several datasets inside the file succesfully (all the datasets
> being
> the same size). If MPI was to blame, why wouldn't it crash at the
> first write?
>
> Thank you for your help.
> Fred
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup
> .org
> Twitter: https://twitter.com/hdf5