Cache dataspace of a HDF5 file

Hi,
  Is it possible for a user to cache the dataspace of a file? Consider the
following:

Step 1:
  /* Create the data space with unlimited dimensions. */
  mydataspace = H5Screate_simple (RANK, dims, maxdims);

Step 2:
  /* Create a new dataset on a file. */
  mydataset = H5Dcreate (file, DATASETNAME, H5T_NATIVE_INT, mydataspace,
                         H5P_DEFAULT, cparms, H5P_DEFAULT);

Step 3:
  myfilespace = H5Dget_space (mydataset);
  /* Do some operations on 'myfilespace', such as selecting a hyperslab */

In this case, do I need to obtain the dataspace for the data set 'mydataset'
in step 3? I would prefer to reuse the cached dataspace 'mydataspace' that was
passed into the creation of 'mydataset'. Could reusing the cached dataspace
cause any problems other than thread safety?

Regards,
Ravi

Hi Ravi,

  while I don't know the exact internals of HDF5, I would assume that in your case
"mydataspace" and "myfilespace" are exactly the same. The first is used to define
the properties of a dataspace, the second one is holding the properties of the same
dataspace, so it should be exactly the same.

Did you run into any problems with using "mydataspace" instead of "myfilespace" ?
How would it affect thread safety?

  Werner

···

On Mon, 01 Feb 2010 16:44:13 +0100, Ravikiran Rajagopal <ravi.rajagopal@broadcom.com> wrote:

Hi,
  Is it possible for a user to cache the dataspace of a file? Consider the
following:

Step 1:
  /* Create the data space with unlimited dimensions. */
  mydataspace = H5Screate_simple (RANK, dims, maxdims);

Step 2:
  /* Create a new dataset on a file. */
  mydataset = H5Dcreate (file, DATASETNAME, H5T_NATIVE_INT, mydataspace,
                         H5P_DEFAULT, cparms, H5P_DEFAULT);

Step 3:
  myfilespace = H5Dget_space (mydataset);
  /* Do some operations on 'myfilespace', such as selecting a hyperslab */

In this case, do I need to obtain the dataspace for the data set 'mydataset'
in step 3? I would prefer to reuse the cached dataspace 'mydataspace' that was
passed into the creation of 'mydataset'. Could reusing the cached dataspace
cause any problems other than thread safety?

Regards,
Ravi

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

  while I don't know the exact internals of HDF5, I would assume that in
your case "mydataspace" and "myfilespace" are exactly the same. The first
is used to define the properties of a dataspace, the second one is holding
the properties of the same dataspace, so it should be exactly the same.

I assume so as well, but was looking for confirmation.

Did you run into any problems with using "mydataspace" instead of
"myfilespace" ?

No, but incorrect assumptions always seem to manifest themselves malignantly
during demos or critical production times - hence the need for confirmation.

How would it affect thread safety?

Just the usual resource contention between threads: if another thread modifies
the dataspace of the dataset, then we could be left holding an outdated
reference. However, if all the identifiers are merely pointers to a single
underlying structure which is protected properly, then there are no thread-
safety issues.

Regards,
Ravi

···

On Monday 01 February 2010 12:07:27 pm Werner Benger wrote:

  while I don't know the exact internals of HDF5, I would assume that in
your case "mydataspace" and "myfilespace" are exactly the same. The first
is used to define the properties of a dataspace, the second one is holding
the properties of the same dataspace, so it should be exactly the same.

I assume so as well, but was looking for confirmation.

Did you run into any problems with using "mydataspace" instead of
"myfilespace" ?

No, but incorrect assumptions always seem to manifest themselves malignantly
during demos or critical production times - hence the need for confirmation.

Even if the returned HDF5 id's would be numerically different, they should refer
to an object with the same properties.

There might be in the future something like a "shared dataspace", which could
also hold attributes and such, but as long as this is not implemented, there
should be no differences.

How would it affect thread safety?

Just the usual resource contention between threads: if another thread modifies
the dataspace of the dataset, then we could be left holding an outdated
reference. However, if all the identifiers are merely pointers to a single
underlying structure which is protected properly, then there are no thread-
safety issues.

Yes, the HDF5 id's are just references to internal structures, and as long
as HDF5 is compiled with the thread-safety option, they are protected through
the HDF5 API.

cheers,
  Werner

···

On Mon, 01 Feb 2010 19:38:55 +0100, Ravikiran Rajagopal <ravi.rajagopal@broadcom.com> wrote:

On Monday 01 February 2010 12:07:27 pm Werner Benger wrote:

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Unfortunately, this turned out not to be true. If a dataset is modified (e.g.,
by extending it), then a cached dataspace is no loner valid. So it seems that
every time a dataset is modified, its data space must be reacquired. This has
performance implications since function calls are expensive (even to
H5Dget_space). I will need to cache data in my application and minimize the
number of calls to the combination of
  H5Dset_extent + H5Dget_space + H5sSelect_hyperslab + H5Dwrite
which complicates my code a little bit.

Regards,
Ravi

···

On Monday 01 February 2010 01:46:00 pm Werner Benger wrote:

>> while I don't know the exact internals of HDF5, I would assume that in
>> your case "mydataspace" and "myfilespace" are exactly the same. The
>> first is used to define the properties of a dataspace, the second one is
>> holding the properties of the same dataspace, so it should be exactly
>> the same.
>
> I assume so as well, but was looking for confirmation.
>
>> Did you run into any problems with using "mydataspace" instead of
>> "myfilespace" ?
>
> No, but incorrect assumptions always seem to manifest themselves
> malignantly during demos or critical production times - hence the need
> for confirmation.

Even if the returned HDF5 id's would be numerically different, they should
refer to an object with the same properties.