H5i_dec_ref hangs


#21

Hi Steven,

thx for reply - the size of the data set can be seen in the example above - 1489, 2048, 2.

The correct algorithm is below - I think the problematic parts might be the attributes or the region reference datasets along.

If rank 0 -> Open H5 file in serial mode.
If rank 0 -> create 10 groups
If rank 0 -> inside each group, create a float type dataset (10 in total)
If rank 0 -> inside each group, create a region reference type dataset (10 in total), one dimension, size 10
If rank 0 -> add attributes to the datasets above (vlen strings)
If rank 0 -> Close H5 file in serial mode.
Barrier
Open the file in parallel mode.
Each rank writes to its own float type dataset (rank = index of the dataset).
Close the file in parallel mode.
If rank 0 -> open file in serial mode
If rank 0 -> for each group -> take a region reference dataset and write a 64x64x1 region reference to a float type dataset into it (just an example, I think it doesn't matter which dataset you choose for the region reference)
If rank 0 -> close the file

In python, this gets stuck on: “Close the file in parallel mode”.

Cheers,

Jiri


#22

Do the two datasets that you are trying to write exist in the file? What is their layout? Have they been allocated? (For exampole, contiguous layout uses late allocation. Creating such a dataset without writing any dataset elements does not allocate any storage for dataset elements in the file.)

G.


#23

At first didn’t quite get your problem; had to peruse the thread to see you want this to scale to millions of files. Took on this challenge and added reference interface calls to H5CPP; to test this feature you have to check out the repository:

https://github.com/steven-varga/h5cpp.git
git checkout master
make install # or just copy/link the 'h5cpp' directory to '/usr/local/include'
cd examples/reference && make clean && reset && make

The implementation allows arbitrary shapes, but on the h5::exp::read and h5::exp::write interface: which dereferences the regions for you, and does IO ops – allows only a single block selection.
Bob’s your uncle! Easy, pythonic selection for HDF5 with modern C++. Give me a few days to check your original question.

This is how the syntax looks:

 
#include <armadillo>
#include <vector>
#include <h5cpp/all>

int main(){
    h5::fd_t fd = h5::create("ref.h5", H5F_ACC_TRUNC);
	{
        h5::ds_t ds = h5::create<float>(fd,"01",  
            h5::current_dims{10,20}, h5::chunk{2,2} | h5::fill_value<float>{1} );
        
        h5::reference_t ref = h5::reference(fd, "01", h5::offset{2,2}, h5::count{4,4});
        h5::write(fd, "single reference", ref);
        /* you can factor out `count` this way :  h5::count count{2,2};  */ 
        std::vector<h5::reference_t> idx {
            // The HDF5 CAPI reqires fd + dataset name, instead of hid_t to ds: wishy-washy 
            h5::reference(fd, "01", h5::offset{2,2}, h5::count{4,4}),
            h5::reference(fd, "01", h5::offset{4,8}, h5::count{1,1}),
            h5::reference(fd, "01", h5::offset{6,12}, h5::count{3,3}),
            h5::reference(fd, "01", h5::offset{8,16}, h5::count{2,1})
        };
        // datset shape can be controlled with dimensions, in this case is 2x2
        // and is not related to the selected regions!!! 
        // data type is H5R_DATASET_REGION when dataspace is provided, otherwise OBJECT
        h5::write(fd, "index", idx, h5::current_dims{2,2}, h5::max_dims{H5S_UNLIMITED, 2});
    }
    { // we going to update the regions referenced by the set of region-references 
      // stored in "index"
        h5::ds_t ds = h5::open(fd, "index");
        std::vector color(50, 9);
        // this is to read from selection
        for(auto& ref: h5::read<std::vector<h5::reference_t>>(ds))
            h5::exp::write(ds, ref, color.data());
    }

    { // we are reading back data from the regions, now they all must be 'color' value '9'
        h5::ds_t ds = h5::open(fd, "index");
        // this is to read from selection
        for(auto& ref: h5::read<std::vector<h5::reference_t>>(ds)){
            arma::fmat mat = h5::exp::read<arma::fmat>(ds, ref);
            std::cout << mat << "\n";
        }
    }
    { // for verification
        std::cout << h5::read<arma::fmat>(fd, "01") << "\n\n";

    }
}

dump:


h5dump ref.h5
HDF5 "ref.h5" {
GROUP "/" {
   DATASET "01" {
      DATATYPE  H5T_IEEE_F32LE
      DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
      DATA {
      (0,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (1,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (2,0): 1, 1, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (3,0): 1, 1, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (4,0): 1, 1, 9, 9, 9, 9, 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (5,0): 1, 1, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      (6,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9, 1, 1, 1, 1, 1,
      (7,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9, 1, 1, 1, 1, 1,
      (8,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9, 1, 9, 1, 1, 1,
      (9,0): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1
      }
   }
   DATASET "index" {
      DATATYPE  H5T_REFERENCE { H5T_STD_REF_DSETREG }
      DATASPACE  SIMPLE { ( 2, 2 ) / ( H5S_UNLIMITED, 2 ) }
      DATA {
         DATASET "/01" {
            REGION_TYPE BLOCK  (2,2)-(5,5)
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
         }
         DATASET "/01"  {
            REGION_TYPE BLOCK  (4,8)-(4,8)
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
         }
         DATASET "/01"  {
            REGION_TYPE BLOCK  (6,12)-(8,14)
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
         }
         DATASET "/01"  {
            REGION_TYPE BLOCK  (8,16)-(9,16)
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
         }
      }
   }
   DATASET "single reference" {
      DATATYPE  H5T_REFERENCE { H5T_STD_REF_DSETREG }
      DATASPACE  SCALAR
      DATA {
         DATASET "/01" {
            REGION_TYPE BLOCK  (2,2)-(5,5)
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SIMPLE { ( 10, 20 ) / ( 10, 20 ) }
         }
      }
   }
}
}

steve


#24

There is no problem with the C API calls, I executed the proposed algorithm on an 8 core (I only had a laptop at hand) and did the job. The project is uploaded to my github page, in the makefile please adjust the srun -n 8 -w io ./mpi-reference-test to srun -n 8 ./mpi-reference-test, removing the -w io part.
note: instead of 10 groups, etc I altered your spec to using number of cores; which was 8 on my laptop.

Here is the software:

/* copyright steven varga, vargaconsulting 2021, june 08, Toronto, ON, Canada;  MIT license
*/

#include <mpi.h>  /* MUST preceede H5CPP includes*/ 
#include <h5cpp/all>
#include <string>
#include <vector>
#include <fmt/format.h>


constexpr hsize_t m=1489, n=2048, k=2;
constexpr const char* filename = "mpi-reference.h5";
constexpr float data_value = 3.0;

int main(int argc, char **argv) {

    int rank_size, current_rank, name_len;
    MPI_Init(NULL, NULL);
	MPI_Info info  = MPI_INFO_NULL;
	MPI_Comm comm  = MPI_COMM_WORLD;

    MPI_Comm_size(MPI_COMM_WORLD, &rank_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &current_rank);
    
    if(current_rank == 0) { // serial mode, ran on rank 0
        h5::fd_t fd = h5::create(filename, H5F_ACC_TRUNC);
        for(int i=0; i<rank_size; i++){
            // this is your dataset:
            h5::ds_t ds = h5::create<double>(fd, fmt::format("{:03d}/dataset", i), h5::chunk{149,256,1}, h5::current_dims{m,n,k});
            // VL string attribute written to it
            ds["attribute name"] = "this is a vl string attribute";
            // and a reference, with rank_size -- which is 10 in the description
            h5::create<h5::reference_t>(fd, fmt::format("{:03d}/reference", i), h5::current_dims{static_cast<hsize_t>(rank_size)}, 
                h5::chunk{2});
        }
    }
    MPI_Barrier(MPI_COMM_WORLD);

    { // parallel mode
        h5::fd_t fd = h5::open(filename, H5F_ACC_RDWR, h5::mpiio({MPI_COMM_WORLD, info}));
        h5::write(fd,  fmt::format("{:03d}/dataset",current_rank), 
            std::vector<float>(m*n*k, 1.0), // <-- single shot write: dataset size must match file space size 
            h5::independent);               // collective | independent
    }
    MPI_Barrier(MPI_COMM_WORLD);

    if(current_rank == 0) { // serial mode, ran on rank 0
        // dimension >= referenced region or bad things happen
        std::vector<float> in_memory_values(64*64, data_value);

        h5::fd_t fd = h5::open(filename, H5F_ACC_RDWR);
        for(int i=0; i<rank_size; i++){
            h5::reference_t reference = h5::reference(fd, // CREATE a reference
                fmt::format("{:03d}/dataset", i),         // dataset the region is within
                h5::offset{1, 1, 0},                      // location of the region 
                h5::count{64,64,1});                      // size of region 
            // write data into the referenced region, the memepry space of passed pointer must match the region 
            h5::exp::write(fd, fmt::format("{:03d}/dataset", i) , reference, in_memory_values.data());
            // don't forget to write the REFERENCE into the index file for later use,
            // NOTE: the order of data and reference IO calls is arbitrary
            h5::write(fd, fmt::format("{:03d}/reference", i), 
                reference, 
                h5::count{1}, h5::offset{0} ); // <-- which cell you update within file space
        };
    }
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    
	return 0;
}

Ps: dont forget to pull the h5cpp github repo, as I added a write io call variant.