Problems with HDF5 on a Lustre filesystem

Peter_Boertz · November 18, 2012, 4:24pm

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre filesystem
and I use HDF5 1.8.9 for data output. Each process has its own file, so
I believe there is no need for the parallel HDF5 version, is this correct?

When a larger number (> 4) processes want to dump their data at the same
time, I get various errors of paths and objects not found or any other
operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the read/write
operations so that all other processes have to check whether anyone else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file, which
is however excluded by the queuing system anyway, so this was more of a
paranoid move to be absolutely sure. All that helped reducing the number
fatal errors significantly, but did not completely get rid of them. The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the Lustre
filesystem plays some role in this, since it is the only difference to
my personal computer where everything runs smoothly. As I said, neither
the errors messages nor the traceback show any consistency.

bye, Peter

Mohamad_Chaarawi · November 19, 2012, 2:36pm

Hi Peter,

The problem does sound strange.
I do not understand why file locking helped reduce errors. I though you said each process writes to its own file anyway, so locking the file or having one process manage the reads/writes should not matter anyway.

Is it possible you could send me a piece of code from your simulation that is performing I/O, that I can look at and diagnose further?
A program that I can run and replicates the problem (on Lustre) would be great. If that is not possible, then please just describe or copy-paste how you are calling into the HDF5 library for your I/O.

Thanks,
Mohamad

···

On 11/18/2012 10:24 AM, Peter Boertz wrote:

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre filesystem
and I use HDF5 1.8.9 for data output. Each process has its own file, so
I believe there is no need for the parallel HDF5 version, is this correct?

When a larger number (> 4) processes want to dump their data at the same
time, I get various errors of paths and objects not found or any other
operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the read/write
operations so that all other processes have to check whether anyone else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file, which
is however excluded by the queuing system anyway, so this was more of a
paranoid move to be absolutely sure. All that helped reducing the number
fatal errors significantly, but did not completely get rid of them. The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the Lustre
filesystem plays some role in this, since it is the only difference to
my personal computer where everything runs smoothly. As I said, neither
the errors messages nor the traceback show any consistency.

bye, Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Peter_Boertz · November 19, 2012, 5:11pm

Hi Mohamad,

thanks for your reply. The reason I suspected Lustre of being the
culprit is simply that the error does not appear on my personal
computer. I thought that maybe the files are written/opened too fast or
too many at the same time for the synchronization process of Lustre to
handle.

I am inserting various pieces of code that show how I am calling the
HDF5 library. Any comment on proper ways of doing so is much appreciated!

To open the file, I use the following code:

···

==================================================================

int H5Interface::OpenFile (std::string filename, int flag) {

bool tried_once = false;

    struct timespec timesp;
    timesp.tv_sec = 0;
    timesp.tv_nsec = 200000000;

    for (int tries = 0; tries < 300; tries++) {
        try {
            H5::Exception::dontPrint();
            if(flag == 0) {
                file = H5::H5File (filename, H5F_ACC_TRUNC);
            } else if (flag == 1) {
                file.openFile(filename, H5F_ACC_RDONLY);
            } else if (flag == 2) {
                file.openFile(filename, H5F_ACC_RDWR);
            }

            if (tried_once) {
                std::cout << "Opening " << filename << " succeded after "
                          << tries << " several tries" << std::endl;
            }
            return 0;

        } catch( FileIException error ) {
            tried_once = true;
        }

        catch( DataSetIException error ) {
            tried_once = true;
        }

        catch( DataSpaceIException error ) {
            tried_once = true;
        }
        nanosleep(&timesp, NULL);
    }
    std::cerr << "H5Interface:\tOpening " << filename << " failed";
    return -1;
}

It often happens that opening a file succeeds only after 1 or 2 tries.

I write and read strings like this:

==================================================================

int H5Interface::WriteString(std::string path, std::string value) {
    try {
        H5::Exception::dontPrint();
        H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
        H5std_string str (value);
        hsize_t dims[1] = { 1 };
        H5::DataSpace str_space(uint(1), dims, NULL);
        H5::DataSet str_set;
        if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
            str_set = file.openDataSet(path);
        } else {
            str_set = file.createDataSet(path, str_t, str_space);
        }
        str_set.write (str, str_t);
        str_set.close();
    }
    catch( FileIException error ) {
        // error.printError();
        return -1;
    }

    catch( DataSetIException error ) {
        // error.printError();
        return -1;
    }

    catch( DataSpaceIException error ) {
        // error.printError();
        return -1;
    }
    return 0;
}

==================================================================

int H5Interface::ReadString(std::string path, std::string * data) {
    try {
    H5::Exception::dontPrint();
        if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
            H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
            H5std_string str;
            H5::DataSet str_set = file.openDataSet(path);
            str_set.read (str, str_t);
            str_set.close();
            *data = std::string(str);
        }
    }
    catch( FileIException error ) {
        // error.printError();
        return -1;
    }

    catch( DataSetIException error ) {
        // error.printError();
        return -1;
    }

    catch( DataSpaceIException error ) {
        // error.printError();
        return -1;
    }
    return 0;
}

And finally for writing and reading boost::multi_arrays, for example:

==================================================================

int H5Interface::Read2IntMultiArray(std::string path,
                                    boost::multi_array<int,2>& data) {
    try {
        H5::DataSet v_set = file.openDataSet(path);
        H5::DataSpace space = v_set.getSpace();
        hsize_t dims[2];

int rank = space.getSimpleExtentDims( dims );

        DataSpace mspace(rank, dims);
        int data_out[dims[0]][dims[1]];
        data.resize(boost::extents[dims[0]][dims[1]]);
        v_set.read( data_out, PredType::NATIVE_INT, mspace, space );
        for (int i = 0; i < int(dims[0]); i++) {
            for (int j = 0; j < int(dims[1]); j++) {
                data[i][j] = data_out[i][j];
            }
        }
        v_set.close();
    }
    [...]

==================================================================

int H5Interface::WriteIntMatrix(std::string path, uint rows,
                                 uint cols, int * data) {
    try {
        H5::Exception::dontPrint();
    hsize_t dims_m[2] = { rows, cols };
        H5::DataSpace v_space (2, dims_m);
    H5::DataSet v_set;
    if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
            v_set = file.openDataSet(path);
        } else {
            v_set = file.createDataSet(path, H5::PredType::NATIVE_INT,
v_space);
        }
        v_set.write(data, H5::PredType::NATIVE_INT);
        v_set.close();
    }
    [...]

As far as the workflow goes, a scheduler provides the basic h5 file with
all the parameters and tells the workers to load this file and then put
their measurements in. So they are enlarging the file as time goes by.

Have a nice day, Peter

On 11/19/2012 03:36 PM, Mohamad Chaarawi wrote:

Hi Peter,

The problem does sound strange.
I do not understand why file locking helped reduce errors. I though
you said each process writes to its own file anyway, so locking the
file or having one process manage the reads/writes should not matter
anyway.

Is it possible you could send me a piece of code from your simulation
that is performing I/O, that I can look at and diagnose further?
A program that I can run and replicates the problem (on Lustre) would
be great. If that is not possible, then please just describe or
copy-paste how you are calling into the HDF5 library for your I/O.

Thanks,
Mohamad

On 11/18/2012 10:24 AM, Peter Boertz wrote:

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre filesystem
and I use HDF5 1.8.9 for data output. Each process has its own file, so
I believe there is no need for the parallel HDF5 version, is this
correct?

When a larger number (> 4) processes want to dump their data at the same
time, I get various errors of paths and objects not found or any other
operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the read/write
operations so that all other processes have to check whether anyone else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file, which
is however excluded by the queuing system anyway, so this was more of a
paranoid move to be absolutely sure. All that helped reducing the number
fatal errors significantly, but did not completely get rid of them. The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the Lustre
filesystem plays some role in this, since it is the only difference to
my personal computer where everything runs smoothly. As I said, neither
the errors messages nor the traceback show any consistency.

bye, Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Mohamad_Chaarawi · November 20, 2012, 2:29pm

Hi Peter,

Yes nothing seems unusual for me in the case that each process accesses its own file.
Since you mentioned that this works on your local filesystem, did you try and check the structure of the files and make sure that they are correct their (using h5dump or other tools)?
Otherwise I'm not sure what could be wrong. I'm not familiar with C++ either, so someone else could also have other comments.

Mohamad

···

On 11/19/2012 11:11 AM, Peter Boertz wrote:

Hi Mohamad,

thanks for your reply. The reason I suspected Lustre of being the
culprit is simply that the error does not appear on my personal
computer. I thought that maybe the files are written/opened too fast or
too many at the same time for the synchronization process of Lustre to
handle.

I am inserting various pieces of code that show how I am calling the
HDF5 library. Any comment on proper ways of doing so is much appreciated!

To open the file, I use the following code:

==================================================================

int H5Interface::OpenFile (std::string filename, int flag) {

     bool tried_once = false;

     struct timespec timesp;
     timesp.tv_sec = 0;
     timesp.tv_nsec = 200000000;

     for (int tries = 0; tries < 300; tries++) {
         try {
             H5::Exception::dontPrint();
             if(flag == 0) {
                 file = H5::H5File (filename, H5F_ACC_TRUNC);
             } else if (flag == 1) {
                 file.openFile(filename, H5F_ACC_RDONLY);
             } else if (flag == 2) {
                 file.openFile(filename, H5F_ACC_RDWR);
             }

             if (tried_once) {
                 std::cout << "Opening " << filename << " succeded after "
                           << tries << " several tries" << std::endl;
             }
             return 0;

         } catch( FileIException error ) {
             tried_once = true;
         }

         catch( DataSetIException error ) {
             tried_once = true;
         }

         catch( DataSpaceIException error ) {
             tried_once = true;
         }
         nanosleep(&timesp, NULL);
     }
     std::cerr << "H5Interface:\tOpening " << filename << " failed";
     return -1;
}

It often happens that opening a file succeeds only after 1 or 2 tries.

I write and read strings like this:

==================================================================

int H5Interface::WriteString(std::string path, std::string value) {
     try {
         H5::Exception::dontPrint();
         H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
         H5std_string str (value);
         hsize_t dims[1] = { 1 };
         H5::DataSpace str_space(uint(1), dims, NULL);
         H5::DataSet str_set;
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             str_set = file.openDataSet(path);
         } else {
             str_set = file.createDataSet(path, str_t, str_space);
         }
         str_set.write (str, str_t);
         str_set.close();
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

==================================================================

int H5Interface::ReadString(std::string path, std::string * data) {
     try {
     H5::Exception::dontPrint();
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
             H5std_string str;
             H5::DataSet str_set = file.openDataSet(path);
             str_set.read (str, str_t);
             str_set.close();
             *data = std::string(str);
         }
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

And finally for writing and reading boost::multi_arrays, for example:

==================================================================

int H5Interface::Read2IntMultiArray(std::string path,
                                     boost::multi_array<int,2>& data) {
     try {
         H5::DataSet v_set = file.openDataSet(path);
         H5::DataSpace space = v_set.getSpace();
         hsize_t dims[2];

         int rank = space.getSimpleExtentDims( dims );

         DataSpace mspace(rank, dims);
         int data_out[dims[0]][dims[1]];
         data.resize(boost::extents[dims[0]][dims[1]]);
         v_set.read( data_out, PredType::NATIVE_INT, mspace, space );
         for (int i = 0; i < int(dims[0]); i++) {
             for (int j = 0; j < int(dims[1]); j++) {
                 data[i][j] = data_out[i][j];
             }
         }
         v_set.close();
     }
     [...]

==================================================================

int H5Interface::WriteIntMatrix(std::string path, uint rows,
                                  uint cols, int * data) {
     try {
         H5::Exception::dontPrint();
     hsize_t dims_m[2] = { rows, cols };
         H5::DataSpace v_space (2, dims_m);
     H5::DataSet v_set;
     if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             v_set = file.openDataSet(path);
         } else {
             v_set = file.createDataSet(path, H5::PredType::NATIVE_INT,
v_space);
         }
         v_set.write(data, H5::PredType::NATIVE_INT);
         v_set.close();
     }
     [...]

As far as the workflow goes, a scheduler provides the basic h5 file with
all the parameters and tells the workers to load this file and then put
their measurements in. So they are enlarging the file as time goes by.

Have a nice day, Peter

On 11/19/2012 03:36 PM, Mohamad Chaarawi wrote:

Hi Peter,

The problem does sound strange.
I do not understand why file locking helped reduce errors. I though
you said each process writes to its own file anyway, so locking the
file or having one process manage the reads/writes should not matter
anyway.

Is it possible you could send me a piece of code from your simulation
that is performing I/O, that I can look at and diagnose further?
A program that I can run and replicates the problem (on Lustre) would
be great. If that is not possible, then please just describe or
copy-paste how you are calling into the HDF5 library for your I/O.

Thanks,
Mohamad

On 11/18/2012 10:24 AM, Peter Boertz wrote:

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre filesystem
and I use HDF5 1.8.9 for data output. Each process has its own file, so
I believe there is no need for the parallel HDF5 version, is this
correct?

When a larger number (> 4) processes want to dump their data at the same
time, I get various errors of paths and objects not found or any other
operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the read/write
operations so that all other processes have to check whether anyone else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file, which
is however excluded by the queuing system anyway, so this was more of a
paranoid move to be absolutely sure. All that helped reducing the number
fatal errors significantly, but did not completely get rid of them. The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the Lustre
filesystem plays some role in this, since it is the only difference to
my personal computer where everything runs smoothly. As I said, neither
the errors messages nor the traceback show any consistency.

bye, Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Peter_Boertz · November 21, 2012, 8:45am

Hi Mohamad,

thanks again for your help! Once the program failed, the structure was
corrupted and h5dump was unable to read the file. Any other file which
was also opened at the time but not by the crashing node survived the
crash intact. I have rewritten the dumping part and moved from multiple
classes writing to the same file to having one class that stores all
variables and dumps on demand which appears to work better (tested with
192 nodes). In the preceding approach, I opened and closed the file a
lot of times to add just one variable or an array of variables, may be
this was the problem? Although I still find it weird that it worked
flawlessly on my personal computer ...

Peter

···

On 11/20/2012 03:29 PM, Mohamad Chaarawi wrote:

Hi Peter,

Yes nothing seems unusual for me in the case that each process
accesses its own file.
Since you mentioned that this works on your local filesystem, did you
try and check the structure of the files and make sure that they are
correct their (using h5dump or other tools)?
Otherwise I'm not sure what could be wrong. I'm not familiar with C++
either, so someone else could also have other comments.

Mohamad

On 11/19/2012 11:11 AM, Peter Boertz wrote:

Hi Mohamad,

thanks for your reply. The reason I suspected Lustre of being the
culprit is simply that the error does not appear on my personal
computer. I thought that maybe the files are written/opened too fast or
too many at the same time for the synchronization process of Lustre to
handle.

I am inserting various pieces of code that show how I am calling the
HDF5 library. Any comment on proper ways of doing so is much
appreciated!

To open the file, I use the following code:

==================================================================

int H5Interface::OpenFile (std::string filename, int flag) {

     bool tried_once = false;

     struct timespec timesp;
     timesp.tv_sec = 0;
     timesp.tv_nsec = 200000000;

     for (int tries = 0; tries < 300; tries++) {
         try {
             H5::Exception::dontPrint();
             if(flag == 0) {
                 file = H5::H5File (filename, H5F_ACC_TRUNC);
             } else if (flag == 1) {
                 file.openFile(filename, H5F_ACC_RDONLY);
             } else if (flag == 2) {
                 file.openFile(filename, H5F_ACC_RDWR);
             }

             if (tried_once) {
                 std::cout << "Opening " << filename << " succeded
after "
                           << tries << " several tries" << std::endl;
             }
             return 0;

         } catch( FileIException error ) {
             tried_once = true;
         }

         catch( DataSetIException error ) {
             tried_once = true;
         }

         catch( DataSpaceIException error ) {
             tried_once = true;
         }
         nanosleep(&timesp, NULL);
     }
     std::cerr << "H5Interface:\tOpening " << filename << " failed";
     return -1;
}

It often happens that opening a file succeeds only after 1 or 2 tries.

I write and read strings like this:

==================================================================

int H5Interface::WriteString(std::string path, std::string value) {
     try {
         H5::Exception::dontPrint();
         H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
         H5std_string str (value);
         hsize_t dims[1] = { 1 };
         H5::DataSpace str_space(uint(1), dims, NULL);
         H5::DataSet str_set;
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             str_set = file.openDataSet(path);
         } else {
             str_set = file.createDataSet(path, str_t, str_space);
         }
         str_set.write (str, str_t);
         str_set.close();
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

==================================================================

int H5Interface::ReadString(std::string path, std::string * data) {
     try {
     H5::Exception::dontPrint();
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
             H5std_string str;
             H5::DataSet str_set = file.openDataSet(path);
             str_set.read (str, str_t);
             str_set.close();
             *data = std::string(str);
         }
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

And finally for writing and reading boost::multi_arrays, for example:

==================================================================

int H5Interface::Read2IntMultiArray(std::string path,
                                     boost::multi_array<int,2>& data) {
     try {
         H5::DataSet v_set = file.openDataSet(path);
         H5::DataSpace space = v_set.getSpace();
         hsize_t dims[2];

         int rank = space.getSimpleExtentDims( dims );

         DataSpace mspace(rank, dims);
         int data_out[dims[0]][dims[1]];
         data.resize(boost::extents[dims[0]][dims[1]]);
         v_set.read( data_out, PredType::NATIVE_INT, mspace, space );
         for (int i = 0; i < int(dims[0]); i++) {
             for (int j = 0; j < int(dims[1]); j++) {
                 data[i][j] = data_out[i][j];
             }
         }
         v_set.close();
     }
     [...]

==================================================================

int H5Interface::WriteIntMatrix(std::string path, uint rows,
                                  uint cols, int * data) {
     try {
         H5::Exception::dontPrint();
     hsize_t dims_m[2] = { rows, cols };
         H5::DataSpace v_space (2, dims_m);
     H5::DataSet v_set;
     if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             v_set = file.openDataSet(path);
         } else {
             v_set = file.createDataSet(path, H5::PredType::NATIVE_INT,
v_space);
         }
         v_set.write(data, H5::PredType::NATIVE_INT);
         v_set.close();
     }
     [...]

As far as the workflow goes, a scheduler provides the basic h5 file with
all the parameters and tells the workers to load this file and then put
their measurements in. So they are enlarging the file as time goes by.

Have a nice day, Peter

On 11/19/2012 03:36 PM, Mohamad Chaarawi wrote:

Hi Peter,

The problem does sound strange.
I do not understand why file locking helped reduce errors. I though
you said each process writes to its own file anyway, so locking the
file or having one process manage the reads/writes should not matter
anyway.

Is it possible you could send me a piece of code from your simulation
that is performing I/O, that I can look at and diagnose further?
A program that I can run and replicates the problem (on Lustre) would
be great. If that is not possible, then please just describe or
copy-paste how you are calling into the HDF5 library for your I/O.

Thanks,
Mohamad

On 11/18/2012 10:24 AM, Peter Boertz wrote:

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre
filesystem
and I use HDF5 1.8.9 for data output. Each process has its own
file, so
I believe there is no need for the parallel HDF5 version, is this
correct?

When a larger number (> 4) processes want to dump their data at the
same
time, I get various errors of paths and objects not found or any other
operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with
writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the
read/write
operations so that all other processes have to check whether anyone
else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file,
which
is however excluded by the queuing system anyway, so this was more
of a
paranoid move to be absolutely sure. All that helped reducing the
number
fatal errors significantly, but did not completely get rid of them.
The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the Lustre
filesystem plays some role in this, since it is the only difference to
my personal computer where everything runs smoothly. As I said,
neither
the errors messages nor the traceback show any consistency.

bye, Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Salnikov_Andrei_A · November 22, 2012, 1:06am

Hi Peter,

which version of Lustre are you using? We once observed a very strange
corruption happening when we wrote HDF5 files to Lustre. That was seen
with Lustre client versions 1.8.4. After we switched to 1.8.7 the
problem had disappeared.

Cheers,
Andy

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter
Boertz
Sent: Wednesday, November 21, 2012 12:46 AM
Subject: Re: [Hdf-forum] Problems with HDF5 on a Lustre filesystem

Hi Mohamad,

thanks again for your help! Once the program failed, the structure was
corrupted and h5dump was unable to read the file. Any other file which
was also opened at the time but not by the crashing node survived the
crash intact. I have rewritten the dumping part and moved from multiple
classes writing to the same file to having one class that stores all
variables and dumps on demand which appears to work better (tested with
192 nodes). In the preceding approach, I opened and closed the file a
lot of times to add just one variable or an array of variables, may be
this was the problem? Although I still find it weird that it worked
flawlessly on my personal computer ...

Peter

On 11/20/2012 03:29 PM, Mohamad Chaarawi wrote:
> Hi Peter,
>
> Yes nothing seems unusual for me in the case that each process
> accesses its own file.
> Since you mentioned that this works on your local filesystem, did you
> try and check the structure of the files and make sure that they are
> correct their (using h5dump or other tools)?
> Otherwise I'm not sure what could be wrong. I'm not familiar with C++
> either, so someone else could also have other comments.
>
> Mohamad
>
> On 11/19/2012 11:11 AM, Peter Boertz wrote:
>> Hi Mohamad,
>>
>> thanks for your reply. The reason I suspected Lustre of being the
>> culprit is simply that the error does not appear on my personal
>> computer. I thought that maybe the files are written/opened too fast or
>> too many at the same time for the synchronization process of Lustre to
>> handle.
>>
>> I am inserting various pieces of code that show how I am calling the
>> HDF5 library. Any comment on proper ways of doing so is much
>> appreciated!
>>
>> To open the file, I use the following code:
>>
>>
>> ==================================================================
>>
>> int H5Interface::OpenFile (std::string filename, int flag) {
>>
>> bool tried_once = false;
>>
>> struct timespec timesp;
>> timesp.tv_sec = 0;
>> timesp.tv_nsec = 200000000;
>>
>> for (int tries = 0; tries < 300; tries++) {
>> try {
>> H5::Exception::dontPrint();
>> if(flag == 0) {
>> file = H5::H5File (filename, H5F_ACC_TRUNC);
>> } else if (flag == 1) {
>> file.openFile(filename, H5F_ACC_RDONLY);
>> } else if (flag == 2) {
>> file.openFile(filename, H5F_ACC_RDWR);
>> }
>>
>> if (tried_once) {
>> std::cout << "Opening " << filename << " succeded
>> after "
>> << tries << " several tries" << std::endl;
>> }
>> return 0;
>>
>> } catch( FileIException error ) {
>> tried_once = true;
>> }
>>
>> catch( DataSetIException error ) {
>> tried_once = true;
>> }
>>
>> catch( DataSpaceIException error ) {
>> tried_once = true;
>> }
>> nanosleep(&timesp, NULL);
>> }
>> std::cerr << "H5Interface:\tOpening " << filename << " failed";
>> return -1;
>> }
>>
>> It often happens that opening a file succeeds only after 1 or 2 tries.
>>
>> I write and read strings like this:
>>
>>
>> ==================================================================
>>
>> int H5Interface::WriteString(std::string path, std::string value) {
>> try {
>> H5::Exception::dontPrint();
>> H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
>> H5std_string str (value);
>> hsize_t dims[1] = { 1 };
>> H5::DataSpace str_space(uint(1), dims, NULL);
>> H5::DataSet str_set;
>> if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
>> str_set = file.openDataSet(path);
>> } else {
>> str_set = file.createDataSet(path, str_t, str_space);
>> }
>> str_set.write (str, str_t);
>> str_set.close();
>> }
>> catch( FileIException error ) {
>> // error.printError();
>> return -1;
>> }
>>
>> catch( DataSetIException error ) {
>> // error.printError();
>> return -1;
>> }
>>
>> catch( DataSpaceIException error ) {
>> // error.printError();
>> return -1;
>> }
>> return 0;
>> }
>>
>>
>> ==================================================================
>>
>>
>> int H5Interface::ReadString(std::string path, std::string * data) {
>> try {
>> H5::Exception::dontPrint();
>> if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
>> H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
>> H5std_string str;
>> H5::DataSet str_set = file.openDataSet(path);
>> str_set.read (str, str_t);
>> str_set.close();
>> *data = std::string(str);
>> }
>> }
>> catch( FileIException error ) {
>> // error.printError();
>> return -1;
>> }
>>
>> catch( DataSetIException error ) {
>> // error.printError();
>> return -1;
>> }
>>
>> catch( DataSpaceIException error ) {
>> // error.printError();
>> return -1;
>> }
>> return 0;
>> }
>>
>>
>>
>> And finally for writing and reading boost::multi_arrays, for example:
>>
>>
>>
>> ==================================================================
>>
>>
>> int H5Interface::Read2IntMultiArray(std::string path,
>> boost::multi_array<int,2>& data) {
>> try {
>> H5::DataSet v_set = file.openDataSet(path);
>> H5::DataSpace space = v_set.getSpace();
>> hsize_t dims[2];
>>
>> int rank = space.getSimpleExtentDims( dims );
>>
>> DataSpace mspace(rank, dims);
>> int data_out[dims[0]][dims[1]];
>> data.resize(boost::extents[dims[0]][dims[1]]);
>> v_set.read( data_out, PredType::NATIVE_INT, mspace, space );
>> for (int i = 0; i < int(dims[0]); i++) {
>> for (int j = 0; j < int(dims[1]); j++) {
>> data[i][j] = data_out[i][j];
>> }
>> }
>> v_set.close();
>> }
>> [...]
>>
>>
>> ==================================================================
>>
>>
>> int H5Interface::WriteIntMatrix(std::string path, uint rows,
>> uint cols, int * data) {
>> try {
>> H5::Exception::dontPrint();
>> hsize_t dims_m[2] = { rows, cols };
>> H5::DataSpace v_space (2, dims_m);
>> H5::DataSet v_set;
>> if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
>> v_set = file.openDataSet(path);
>> } else {
>> v_set = file.createDataSet(path, H5::PredType::NATIVE_INT,
>> v_space);
>> }
>> v_set.write(data, H5::PredType::NATIVE_INT);
>> v_set.close();
>> }
>> [...]
>>
>>
>>
>> As far as the workflow goes, a scheduler provides the basic h5 file
with
>> all the parameters and tells the workers to load this file and then put
>> their measurements in. So they are enlarging the file as time goes by.
>>
>> Have a nice day, Peter
>>
>>
>>
>> On 11/19/2012 03:36 PM, Mohamad Chaarawi wrote:
>>> Hi Peter,
>>>
>>> The problem does sound strange.
>>> I do not understand why file locking helped reduce errors. I though
>>> you said each process writes to its own file anyway, so locking the
>>> file or having one process manage the reads/writes should not matter
>>> anyway.
>>>
>>> Is it possible you could send me a piece of code from your simulation
>>> that is performing I/O, that I can look at and diagnose further?
>>> A program that I can run and replicates the problem (on Lustre) would
>>> be great. If that is not possible, then please just describe or
>>> copy-paste how you are calling into the HDF5 library for your I/O.
>>>
>>> Thanks,
>>> Mohamad
>>>
>>> On 11/18/2012 10:24 AM, Peter Boertz wrote:
>>>> Hello everyone,
>>>>
>>>> I run simulations on a cluster (using OpenMPI) with a Lustre
>>>> filesystem
>>>> and I use HDF5 1.8.9 for data output. Each process has its own
>>>> file, so
>>>> I believe there is no need for the parallel HDF5 version, is this
>>>> correct?
>>>>
>>>> When a larger number (> 4) processes want to dump their data at the
>>>> same
>>>> time, I get various errors of paths and objects not found or any
other
>>>> operation failing. I can't really make out the reason for it, as the
>>>> code works fine on my personal workstation and runs for days with
>>>> writes
>>>> / reads every 5 minutes without failing.
>>>>
>>>> What I have tried so far is having one process manage all the
>>>> read/write
>>>> operations so that all other processes have to check whether anyone
>>>> else
>>>> is already dumping their data. I also implemented
>>>> boost::interprocess:file_lock to prevent writing in the same file,
>>>> which
>>>> is however excluded by the queuing system anyway, so this was more
>>>> of a
>>>> paranoid move to be absolutely sure. All that helped reducing the
>>>> number
>>>> fatal errors significantly, but did not completely get rid of them.
>>>> The
>>>> biggest problem is, that some of the files get corrupted when the
>>>> program crashes which is especially inconvenient.
>>>>
>>>> My question is, if there is any obvious mistake I am making and how I
>>>> would go about solving this issue. My initial guess is that the
Lustre
>>>> filesystem plays some role in this, since it is the only difference
to
>>>> my personal computer where everything runs smoothly. As I said,
>>>> neither
>>>> the errors messages nor the traceback show any consistency.
>>>>
>>>> bye, Peter
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum@hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Peter_Boertz · November 22, 2012, 6:49am

Hi Andy,

thanks for the hint! The cluster I am working on currently runs 1.6.7.1
(from /proc/fs/lustre/version). The current solution of minimizing the
addition of data to existing files seems to be working fine, although it
is a bit inefficient. May be after the next update it will work again. I
can only hope they don't upgrade to 1.8.4

bye, Peter

···

On 11/22/2012 02:06 AM, Salnikov, Andrei A. wrote:

Hi Peter,

which version of Lustre are you using? We once observed a very strange
corruption happening when we wrote HDF5 files to Lustre. That was seen
with Lustre client versions 1.8.4. After we switched to 1.8.7 the
problem had disappeared.

Cheers,
Andy

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter
Boertz
Sent: Wednesday, November 21, 2012 12:46 AM
Subject: Re: [Hdf-forum] Problems with HDF5 on a Lustre filesystem

Hi Mohamad,

thanks again for your help! Once the program failed, the structure was
corrupted and h5dump was unable to read the file. Any other file which
was also opened at the time but not by the crashing node survived the
crash intact. I have rewritten the dumping part and moved from multiple
classes writing to the same file to having one class that stores all
variables and dumps on demand which appears to work better (tested with
192 nodes). In the preceding approach, I opened and closed the file a
lot of times to add just one variable or an array of variables, may be
this was the problem? Although I still find it weird that it worked
flawlessly on my personal computer ...

Peter

On 11/20/2012 03:29 PM, Mohamad Chaarawi wrote:

Hi Peter,

Yes nothing seems unusual for me in the case that each process
accesses its own file.
Since you mentioned that this works on your local filesystem, did you
try and check the structure of the files and make sure that they are
correct their (using h5dump or other tools)?
Otherwise I'm not sure what could be wrong. I'm not familiar with C++
either, so someone else could also have other comments.

Mohamad

On 11/19/2012 11:11 AM, Peter Boertz wrote:

Hi Mohamad,

thanks for your reply. The reason I suspected Lustre of being the
culprit is simply that the error does not appear on my personal
computer. I thought that maybe the files are written/opened too fast or
too many at the same time for the synchronization process of Lustre to
handle.

I am inserting various pieces of code that show how I am calling the
HDF5 library. Any comment on proper ways of doing so is much
appreciated!

To open the file, I use the following code:

==================================================================

int H5Interface::OpenFile (std::string filename, int flag) {

     bool tried_once = false;

     struct timespec timesp;
     timesp.tv_sec = 0;
     timesp.tv_nsec = 200000000;

     for (int tries = 0; tries < 300; tries++) {
         try {
             H5::Exception::dontPrint();
             if(flag == 0) {
                 file = H5::H5File (filename, H5F_ACC_TRUNC);
             } else if (flag == 1) {
                 file.openFile(filename, H5F_ACC_RDONLY);
             } else if (flag == 2) {
                 file.openFile(filename, H5F_ACC_RDWR);
             }

             if (tried_once) {
                 std::cout << "Opening " << filename << " succeded
after "
                           << tries << " several tries" << std::endl;
             }
             return 0;

         } catch( FileIException error ) {
             tried_once = true;
         }

         catch( DataSetIException error ) {
             tried_once = true;
         }

         catch( DataSpaceIException error ) {
             tried_once = true;
         }
         nanosleep(&timesp, NULL);
     }
     std::cerr << "H5Interface:\tOpening " << filename << " failed";
     return -1;
}

It often happens that opening a file succeeds only after 1 or 2 tries.

I write and read strings like this:

==================================================================

int H5Interface::WriteString(std::string path, std::string value) {
     try {
         H5::Exception::dontPrint();
         H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
         H5std_string str (value);
         hsize_t dims[1] = { 1 };
         H5::DataSpace str_space(uint(1), dims, NULL);
         H5::DataSet str_set;
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             str_set = file.openDataSet(path);
         } else {
             str_set = file.createDataSet(path, str_t, str_space);
         }
         str_set.write (str, str_t);
         str_set.close();
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

==================================================================

int H5Interface::ReadString(std::string path, std::string * data) {
     try {
     H5::Exception::dontPrint();
         if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             H5::StrType str_t(H5::PredType::C_S1, H5T_VARIABLE);
             H5std_string str;
             H5::DataSet str_set = file.openDataSet(path);
             str_set.read (str, str_t);
             str_set.close();
             *data = std::string(str);
         }
     }
     catch( FileIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSetIException error ) {
         // error.printError();
         return -1;
     }

     catch( DataSpaceIException error ) {
         // error.printError();
         return -1;
     }
     return 0;
}

And finally for writing and reading boost::multi_arrays, for example:

==================================================================

int H5Interface::Read2IntMultiArray(std::string path,
                                     boost::multi_array<int,2>& data) {
     try {
         H5::DataSet v_set = file.openDataSet(path);
         H5::DataSpace space = v_set.getSpace();
         hsize_t dims[2];

         int rank = space.getSimpleExtentDims( dims );

         DataSpace mspace(rank, dims);
         int data_out[dims[0]][dims[1]];
         data.resize(boost::extents[dims[0]][dims[1]]);
         v_set.read( data_out, PredType::NATIVE_INT, mspace, space );
         for (int i = 0; i < int(dims[0]); i++) {
             for (int j = 0; j < int(dims[1]); j++) {
                 data[i][j] = data_out[i][j];
             }
         }
         v_set.close();
     }
     [...]

==================================================================

int H5Interface::WriteIntMatrix(std::string path, uint rows,
                                  uint cols, int * data) {
     try {
         H5::Exception::dontPrint();
     hsize_t dims_m[2] = { rows, cols };
         H5::DataSpace v_space (2, dims_m);
     H5::DataSet v_set;
     if (H5Lexists(file.getId(), path.c_str(), H5P_DEFAULT)) {
             v_set = file.openDataSet(path);
         } else {
             v_set = file.createDataSet(path, H5::PredType::NATIVE_INT,
v_space);
         }
         v_set.write(data, H5::PredType::NATIVE_INT);
         v_set.close();
     }
     [...]

As far as the workflow goes, a scheduler provides the basic h5 file

with

all the parameters and tells the workers to load this file and then put
their measurements in. So they are enlarging the file as time goes by.

Have a nice day, Peter

On 11/19/2012 03:36 PM, Mohamad Chaarawi wrote:

Hi Peter,

The problem does sound strange.
I do not understand why file locking helped reduce errors. I though
you said each process writes to its own file anyway, so locking the
file or having one process manage the reads/writes should not matter
anyway.

Is it possible you could send me a piece of code from your simulation
that is performing I/O, that I can look at and diagnose further?
A program that I can run and replicates the problem (on Lustre) would
be great. If that is not possible, then please just describe or
copy-paste how you are calling into the HDF5 library for your I/O.

Thanks,
Mohamad

On 11/18/2012 10:24 AM, Peter Boertz wrote:

Hello everyone,

I run simulations on a cluster (using OpenMPI) with a Lustre
filesystem
and I use HDF5 1.8.9 for data output. Each process has its own
file, so
I believe there is no need for the parallel HDF5 version, is this
correct?

When a larger number (> 4) processes want to dump their data at the
same
time, I get various errors of paths and objects not found or any

other

operation failing. I can't really make out the reason for it, as the
code works fine on my personal workstation and runs for days with
writes
/ reads every 5 minutes without failing.

What I have tried so far is having one process manage all the
read/write
operations so that all other processes have to check whether anyone
else
is already dumping their data. I also implemented
boost::interprocess:file_lock to prevent writing in the same file,
which
is however excluded by the queuing system anyway, so this was more
of a
paranoid move to be absolutely sure. All that helped reducing the
number
fatal errors significantly, but did not completely get rid of them.
The
biggest problem is, that some of the files get corrupted when the
program crashes which is especially inconvenient.

My question is, if there is any obvious mistake I am making and how I
would go about solving this issue. My initial guess is that the

Lustre

filesystem plays some role in this, since it is the only difference

to

my personal computer where everything runs smoothly. As I said,
neither
the errors messages nor the traceback show any consistency.

bye, Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Problems with HDF5 on a Lustre filesystem