The error message indicates that a hyperslab selection goes beyond dataset extent.
Please make sure that you are using the correct values for the start, stride, count and block parameters in the H5Sselect_hyperslab call (if you use it!). It will help if you provide an excerpt from your code that selects hyperslabs for each process.
···
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On May 28, 2015, at 1:46 PM, Brandon Barker <brandon.barker@cornell.edu<mailto:brandon.barker@cornell.edu>> wrote:
I believe I've gotten a bit closer by using chunked datasets<https://github.com/cornell-comp-internal/CR-demos/blob/bc507264fe4040d817a2e9603dace0dc06585015/demos/pHDF5/perfectNumbers.c>, but I'm now not sure how to get past this:
[brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 ./perfectNumbers
m, f, count,: 840, 1680, 84
m, f, count,: 840, 1680, 84
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1:
#000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not within extent
major: Dataspace
minor: Out of range
perfectNumbers: perfectNumbers.c:399: restore: Assertion `status != -1' failed.
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 28420 on node euca-128-84-11-180 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
(m,f,count) represent the memory space and dataspace lengths and the count of strided segments to be read in; prior to using set extents as follows, I would get the error when f was not a multiple of m
dimsf[0] = dimsm[0] * mpi_size;
H5Dset_extent(dset_id, dimsf);
Now that I am using these<https://github.com/cornell-comp-internal/CR-demos/blob/bc507264fe4040d817a2e9603dace0dc06585015/demos/pHDF5/perfectNumbers.c#L351>, I note that it doesn't seem to have helped the issue, so there must be something else I still need to do.
Incidentally, I was looking at this example<https://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/examples/h5_extend.c> and am not sure what the point of the following code is since rank_chunk is never used:
if (H5D_CHUNKED == H5Pget_layout (prop))
rank_chunk = H5Pget_chunk (prop, rank, chunk_dimsr);
I guess it is just to demonstrate the function call of H5Pget_chunk?
On Thu, May 28, 2015 at 10:27 AM, Brandon Barker <brandon.barker@cornell.edu<mailto:brandon.barker@cornell.edu>> wrote:
Hi All,
I have fixed (and pushed the fix for) one bug that related to an improperly defined count in the restore function. I still have issues for m != n:
#000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not within extent
major: Dataspace
minor: Out of range
I believe this is indicative of me needing to use chunked datasets so that my dataset can grow in size dynamically.
On Wed, May 27, 2015 at 5:03 PM, Brandon Barker <brandon.barker@cornell.edu<mailto:brandon.barker@cornell.edu>> wrote:
Hi All,
I've been learning pHDF5 by way of developing a toy application that checkpoints and restores its state. The restore function was the last to be implemented, but I realized after doing so that I have an issue: since each process has strided blocks of data that it is responsible for, the number of blocks of data saved during one run may not be evenly distributed among processes in another run, as the mpi_size of the latter run may not evenly divide the total number of blocks.
I was hoping that a fill value might save me here, and just read in 0s if I try reading beyond the end of the dataset. Although, I believe I did see a page noting that this isn't possible for contiguous datasets.
The good news is that since I'm working with 1-dimenional data, it is fairly easy to refactor relevant code.
The error I get emits this message:
[brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 perfectNumbers
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0:
#000: ../../src/H5Dio.c line 179 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: ../../src/H5Dio.c line 446 in H5D__read(): src and dest data spaces have different sizes
major: Invalid arguments to routine
minor: Bad value
perfectNumbers: perfectNumbers.c:382: restore: Assertion `status != -1' failed.
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3717 on node euca-128-84-11-180 exited on signal 11 (Segmentation fault).
Here is the offending line <https://github.com/cornell-comp-internal/CR-demos/blob/3d7ac426b041956b860a0b83c36b88024a64ac1c/demos/pHDF5/perfectNumbers.c#L380> in the restore function; you can observe the checkpoint function to see how things are written out to disk.
General pointers are appreciated as well - to paraphrase the problem more simply: I have a distributed (strided) array I write out to disk as a dataset among n processes, and when I restart the program, I may want to divvy up the data among m processes in similar datastructures as before, but now m != n. Actually, my problem may be different than just this, since I seem to get the same issue even when m == n ... hmm.
Thanks,
--
Brandon E. Barker
http://www.cac.cornell.edu/barker/
--
Brandon E. Barker
http://www.cac.cornell.edu/barker/
--
Brandon E. Barker
http://www.cac.cornell.edu/barker/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5