streaming HDF?

I'm currently in the process of trying to develop a data archive storage
system for GPS data that is being received, and needs to be stored in
real-time.

The storage doesn't seem to be that big of a problem. The problem comes
when we have applications that also need to be able to read that data in
real-time. I've been reading the documentation and doing some
experimenting but haven't succeeded in coming up with a workable
implementation.

First, doing a reopen where compression is involved has resulted in
decompression errors. That's not overly surprising but it'd be nice to
be able to support compression and data following (along the same lines
as tail -f) at the same time if at all possible.

Second, when I did do a reopen, I still never saw any of the new data
updates. Is the only way I'm going to be able to achieve this by
closing and opening the file again, or is there a smarter way to do this?

Third, I'm using packet tables of compound data objects, due to the
complexity of the data being stored.

Here's a bit of the code that I've been trying to implement (hopefully
it's not excessive for the list):

#pragma ident "$Id: $"

/* This program is an attempt to determine the behavior of HDF when
* you have multi-process access. One writes, one reads. Also there
* will be an attempt to implement file following */

#include <unistd.h>
#include <iostream>
#include <H5Cpp.h>
#include "Exception.hpp"
#include "H5PacketTable.h"
#include "MSNArchive.hpp"
#include "PointerKiller.hpp"

using namespace std;

// create and write to the file
class ParentProcess
{
public:
   ParentProcess()
         : archive("archive.h5", H5F_ACC_TRUNC)
   {
         // make sure we immediately get the superblock written to the file.
      archive.flush(H5F_SCOPE_GLOBAL);
   }

   void parentProcess()
   {
      FL_PacketTable
*navTable(msnh5::MSNArchive::openOrCreatePT(archive,
msnh5::MSNArchive::ttNav200_2Table));
      sglstd::PointerKiller<FL_PacketTable> pknt(navTable);

      while (true)
      {
         msnh5::NavBits200_2Type::Data data;
         data.storeTime = gpstk::DayTime();

         if (navTable->AppendPacket((void*)(&data)) < 0)
            cerr << "failed to append nav packet" << endl;
         else
         {
            ostringstream s;
            s << "parent wrote packet stamped @ " << data.storeTime.year
              << "/" << data.storeTime.doy << "/" << data.storeTime.sod
              << endl;
            cerr << s.str();
         }

         archive.flush(H5F_SCOPE_GLOBAL);
         sleep(6);
      }
   }

   H5::H5File archive;
};

class ChildProcess
{
public:
   ChildProcess()
         : archive("archive.h5", H5F_ACC_RDONLY)
   {
   }

   void childProcess()
   {
      int rc = 0;
      hsize_t index = 0;
      int err = 0;

      while (true)
      {
         msnh5::NavBits200_2Type::Data data;
         FL_PacketTable
*navTable(msnh5::MSNArchive::openOrCreatePT(archive,
msnh5::MSNArchive::ttNav200_2Table));
         sglstd::PointerKiller<FL_PacketTable> pknt(navTable);

         while ((rc = navTable->GetNextPacket(&data)) >= 0)
         {
            ostringstream s;
            s << "child Read packet stamped @ " << data.storeTime.year
              << "/" << data.storeTime.doy << "/" << data.storeTime.sod
              << endl;
            cerr << s.str();
         }

         sleep(1);

         archive.reopen();
      }
   }

   H5::H5File archive;
};

int main(int argc, char *argv[])
{
   msnh5::MSNArchive::initialize();
// H5::Exception::dontPrint();

   pid_t pid = fork();

   if (pid == 0)
   {
      cerr << "I'm the child" << endl;
      sleep(10);

      ChildProcess cp;

      cp.childProcess();
   }
   else if (pid > 0)
   {
      cerr << "I'm the parent of child pid " << pid << endl;
      ParentProcess pp;

      pp.parentProcess();
   }
   else if (pid < 0)
   {
      perror("fork");
   }

   return 0;
}

Hi John,

I'm currently in the process of trying to develop a data archive storage
system for GPS data that is being received, and needs to be stored in
real-time.

The storage doesn't seem to be that big of a problem. The problem comes
when we have applications that also need to be able to read that data in
real-time. I've been reading the documentation and doing some
experimenting but haven't succeeded in coming up with a workable
implementation.

First, doing a reopen where compression is involved has resulted in
decompression errors. That's not overly surprising but it'd be nice to
be able to support compression and data following (along the same lines
as tail -f) at the same time if at all possible.

Second, when I did do a reopen, I still never saw any of the new data
updates. Is the only way I'm going to be able to achieve this by
closing and opening the file again, or is there a smarter way to do this?

Third, I'm using packet tables of compound data objects, due to the
complexity of the data being stored.

Here's a bit of the code that I've been trying to implement (hopefully
it's not excessive for the list):

  Currently, the HDF5 library doesn't support multiple processes accessing the same file, where 1+ of those processes is writing/modifying the file. We are going to have some form of single-writer/multiple-reader access in the next major release, but it's still too early to make snapshots of this feature for external testing.

  Quincey

···

On Apr 9, 2010, at 10:44 AM, John Knutson wrote:

#pragma ident "$Id: $"

/* This program is an attempt to determine the behavior of HDF when
* you have multi-process access. One writes, one reads. Also there
* will be an attempt to implement file following */

#include <unistd.h>
#include <iostream>
#include <H5Cpp.h>
#include "Exception.hpp"
#include "H5PacketTable.h"
#include "MSNArchive.hpp"
#include "PointerKiller.hpp"

using namespace std;

// create and write to the file
class ParentProcess
{
public:
ParentProcess()
       : archive("archive.h5", H5F_ACC_TRUNC)
{
       // make sure we immediately get the superblock written to the file.
    archive.flush(H5F_SCOPE_GLOBAL);
}

void parentProcess()
{
    FL_PacketTable
*navTable(msnh5::MSNArchive::openOrCreatePT(archive,
msnh5::MSNArchive::ttNav200_2Table));
    sglstd::PointerKiller<FL_PacketTable> pknt(navTable);

    while (true)
    {
       msnh5::NavBits200_2Type::Data data;
       data.storeTime = gpstk::DayTime();

       if (navTable->AppendPacket((void*)(&data)) < 0)
          cerr << "failed to append nav packet" << endl;
       else
       {
          ostringstream s;
          s << "parent wrote packet stamped @ " << data.storeTime.year
            << "/" << data.storeTime.doy << "/" << data.storeTime.sod
            << endl;
          cerr << s.str();
       }

       archive.flush(H5F_SCOPE_GLOBAL);
       sleep(6);
    }
}

H5::H5File archive;
};

class ChildProcess
{
public:
ChildProcess()
       : archive("archive.h5", H5F_ACC_RDONLY)
{
}

void childProcess()
{
    int rc = 0;
    hsize_t index = 0;
    int err = 0;

    while (true)
    {
       msnh5::NavBits200_2Type::Data data;
       FL_PacketTable
*navTable(msnh5::MSNArchive::openOrCreatePT(archive,
msnh5::MSNArchive::ttNav200_2Table));
       sglstd::PointerKiller<FL_PacketTable> pknt(navTable);

       while ((rc = navTable->GetNextPacket(&data)) >= 0)
       {
          ostringstream s;
          s << "child Read packet stamped @ " << data.storeTime.year
            << "/" << data.storeTime.doy << "/" << data.storeTime.sod
            << endl;
          cerr << s.str();
       }

       sleep(1);

       archive.reopen();
    }
}

H5::H5File archive;
};

int main(int argc, char *argv[])
{
msnh5::MSNArchive::initialize();
// H5::Exception::dontPrint();

pid_t pid = fork();

if (pid == 0)
{
    cerr << "I'm the child" << endl;
    sleep(10);

    ChildProcess cp;

    cp.childProcess();
}
else if (pid > 0)
{
    cerr << "I'm the parent of child pid " << pid << endl;
    ParentProcess pp;

    pp.parentProcess();
}
else if (pid < 0)
{
    perror("fork");
}

return 0;
}

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey Koziol wrote:

Hi John,

I'm currently in the process of trying to develop a data archive storage
system for GPS data that is being received, and needs to be stored in
real-time.
    

  Currently, the HDF5 library doesn't support multiple processes accessing the same file, where 1+ of those processes is writing/modifying the file. We are going to have some form of single-writer/multiple-reader access in the next major release, but it's still too early to make snapshots of this feature for external testing.

  Quincey
  

Is there any way of working around this limitation? That's a pretty
serious limitation for our purposes. Is there any kind of schedule that
might hint at when such a capability might become available?

···

On Apr 9, 2010, at 10:44 AM, John Knutson wrote:

Hi John,

···

On Apr 11, 2010, at 11:49 AM, John Knutson wrote:

Quincey Koziol wrote:

Hi John,

On Apr 9, 2010, at 10:44 AM, John Knutson wrote:

I'm currently in the process of trying to develop a data archive storage
system for GPS data that is being received, and needs to be stored in
real-time.
   

  Currently, the HDF5 library doesn't support multiple processes accessing the same file, where 1+ of those processes is writing/modifying the file. We are going to have some form of single-writer/multiple-reader access in the next major release, but it's still too early to make snapshots of this feature for external testing.

  Quincey

Is there any way of working around this limitation? That's a pretty
serious limitation for our purposes. Is there any kind of schedule that
might hint at when such a capability might become available?

  If you are interested in an alpha/early beta snapshot that contains some support for the single-writer/multiple-reader (SWMR) feature, I think we should have one ready in about a month. We don't currently have any plans to add multiple-writer support.

  Quincey

Quincey Koziol wrote:

Hi John,

    If you are interested in an alpha/early beta snapshot that contains some support for the single-writer/multiple-reader (SWMR) feature, I think we should have one ready in about a month. We don't currently have any plans to add multiple-writer support.

  Quincey
  

We might be, if it's that soon.

In the mean-time, it seems I can work around the issue by closing and opening the HDF file again, but being new to HDF5, I'm unsure what the potential problems are there.

Hi John,

Quincey Koziol wrote:

Hi John,

   If you are interested in an alpha/early beta snapshot that contains some support for the single-writer/multiple-reader (SWMR) feature, I think we should have one ready in about a month. We don't currently have any plans to add multiple-writer support.

  Quincey

We might be, if it's that soon.

  OK, I'll post something to the list when I have a useful snapshot ready.

In the mean-time, it seems I can work around the issue by closing and opening the HDF file again, but being new to HDF5, I'm unsure what the potential problems are there.

  That may be fine, but could you describe your usage scenario a bit more?

    Quincey

···

On Apr 12, 2010, at 10:09 AM, John Knutson wrote:

Quincey Koziol wrote:

Hi John,

In the mean-time, it seems I can work around the issue by closing and opening the HDF file again, but being new to HDF5, I'm unsure what the potential problems are there.
    
  That may be fine, but could you describe your usage scenario a bit more?

    Quincey
  

Okay, we're collecting data in near real time from a number of sources, identical in all relevant ways. This leads to roughly 100-200 "messages" arriving every 6 seconds. That data needs to be archived, which we are currently looking at HDF to do.

A small number of applications that use the data are expected to be able to get current updates of the data as it's being written. Currently, the data being used is stored in flat files, and are tracked in a similar (if not identical) method to tail -f. The implementation in HDF will obviously not be the same, but we need some way to be able to get file updates at a ~1s update rate or less.

···

On Apr 12, 2010, at 10:09 AM, John Knutson wrote:

Hi John,

Quincey Koziol wrote:

Hi John,

In the mean-time, it seems I can work around the issue by closing and opening the HDF file again, but being new to HDF5, I'm unsure what the potential problems are there.
   
  That may be fine, but could you describe your usage scenario a bit more?

    Quincey

Okay, we're collecting data in near real time from a number of sources, identical in all relevant ways. This leads to roughly 100-200 "messages" arriving every 6 seconds. That data needs to be archived, which we are currently looking at HDF to do.

  Is there only one process writing to the file?

A small number of applications that use the data are expected to be able to get current updates of the data as it's being written. Currently, the data being used is stored in flat files, and are tracked in a similar (if not identical) method to tail -f. The implementation in HDF will obviously not be the same, but we need some way to be able to get file updates at a ~1s update rate or less.

  Assuming that there is only one process writing to the file, you have provided a very accurate description for the primary SWMR use case. :slight_smile: Until we have something ready for testing, you can open and close the file according to the procedure here: http://www.hdfgroup.org/hdf5-quest.html#grdwt

  Quincey

···

On Apr 13, 2010, at 3:12 PM, John Knutson wrote:

On Apr 12, 2010, at 10:09 AM, John Knutson wrote: