Writing raw (byte array) data to dataset?

Hi,

Is there a way to write raw binary byte array data to an existing dataset
of a different type? E.g., if I have a byte array that represents an array
of doubles (the byte array thus has 8 times as many elements as the double
array, where each set of 8 bytes represents a double), can I somehow write
that data to a double dataset in an HDF5 file? Trying this the naïve way
with HDF.write just returns a -1 status.

The reason why I don't just convert it to a double array before writing is
that I have an instrument which returns all its data in byte arrays, no
matter the type, and then I'd have to write a converter for each of the 10
different types it output in.

Thank you,
Johan Lindberg

···

--
Dr. Johan E. Lindberg

Hi Johan,

The second argument to H5Dwrite function (referring to C API, not C++) sets the type of single element which is expected in data buffer. If you set this to H5T_NATIVE_DOUBLE everything should be fine and data should be properly written since data buffer is just a pointer (void*).

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write

Usually -1 status means that you messed something with memory space and file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
Please send some of your code examples (including how dataset is created) for further investigation...

Regards,
Rafal

W dniu 2017-10-18 o 14:21, Johan Lindberg pisze:

···

Hi,

Is there a way to write raw binary byte array data to an existing dataset of a different type? E.g., if I have a byte array that represents an array of doubles (the byte array thus has 8 times as many elements as the double array, where each set of 8 bytes represents a double), can I somehow write that data to a double dataset in an HDF5 file? Trying this the naïve way with HDF.write just returns a -1 status.

The reason why I don't just convert it to a double array before writing is that I have an instrument which returns all its data in byte arrays, no matter the type, and then I'd have to write a converter for each of the 10 different types it output in.

Thank you,
Johan Lindberg

--
Dr. Johan E. Lindberg

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Rafal,
(Sorry for misspelling your name earlier!)

Thank you for finding this error. I had that line in the code for appending
double data, but then when I was writing the code for appending byte arrays
it somehow got lost. Now it works perfectly fine!

All the best,
Johan

···

Hi Johan,

Everything in your code seems to be fine, but since your initial dataset
X-dimension is "0" (no rows when creating dataset) you have to call:

H5D.set_extent(datasetId, count);

right after you defined a new space dimensions in "count".

See also my minimal working example in C++ (but using C API):

     int rank = 2;
     int nCols = 30;
     int nRows = 5;
     hsize_t tdims[rank] = {0, nCols};
     hsize_t maxdims[rank] = {H5S_UNLIMITED, nCols};
     hid_t space_2d = H5Screate_simple (rank, tdims, maxdims);
     hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
     H5Pset_layout(plist, H5D_CHUNKED);
     hsize_t chunk[rank] = {1, nCols};
     H5Pset_chunk(plist, rank, chunk);

     hid_t datasetId = H5Dcreate(file_id, "TestDataset",
H5T_NATIVE_DOUBLE, space_2d, H5P_DEFAULT, plist, H5P_DEFAULT);

     hsize_t newdims[rank] = {nRows, nCols};
     H5Dset_extent(datasetId, newdims);
     hid_t newspace = H5Dget_space(datasetId);
     hsize_t offset[rank] = {0, 0};
     H5Sselect_hyperslab(newspace, H5S_SELECT_SET, offset, NULL,
newdims, NULL);

     hid_t tmemspace = H5Screate_simple(rank, newdims, NULL);
     double data[nCols * nRows];
     double step = 1.1;
     for(int r = 0; r < nRows; r++)
     {
         for(int c = 0; c < nCols; c++)
         {
             data[c + r * nCols] = step;
             step += 1.1;
         }
     }

     uint8_t *data_bytes = reinterpret_cast<uint8_t*>(data);

     H5Dwrite (datasetId, H5T_NATIVE_DOUBLE, tmemspace, newspace,
H5P_DEFAULT, data_bytes);

     H5Sclose(newspace);
     H5Sclose(tmemspace);
     H5Dclose(datasetId);
     H5Pclose(plist);
     H5Sclose(space_2d);

Best regards,
Rafal

Hi Rafael,

Thank you for your reply!

I am using Visual Studio (C#) and HDF5 P/Invoke. While the syntax is a bit
different from C/C++, the HDF5 functions should do the same.

I am indeed setting the type argument to the native double type. I paste my
(simplified yet still quite lengthy) code below.

It is the status in the try block in the very end that returns -1. This
works fine if I pass the double[] valuesOneDim to H5D.write instead of the
byte[] byteData.

// Create a H5F file and H5G group (not included)
hid_t groupId = ...

// Create dataspace and dataset:
int nCols = 30;
int nRows = 5;
htri_t rank = 2;
ulong[] dims = new ulong[2] { 0, nCols };
ulong?[] maxDims = new ulong[2] { H5S.UNLIMITED, nCols };

string name = "TestDataset";

hid_t dataspaceId = H5S.create_simple(rank, dims, maxDims);
hid_t pList = H5P.create(H5P.DATASET_CREATE);
H5P.set_layout(pList, H5D.layout_t.CHUNKED);
H5P.set_chunk(pList, rank, new ulong[] { 1, (ulong)maxDims[1] });
hid_t datasetId = H5D.create(groupId, name, H5T.NATIVE_DOUBLE, dataspaceId,
                H5P.DEFAULT, pList, H5P.DEFAULT);
H5P.close(pList);

// Generate a 2D (5x30) random double array and converting it to a 1D byte
array.

Random random = new Random();
double[,] values = new double[nCols, nRows];
double[] valuesOneDim = new double[nCols * nRows];
int nBytes = 8;
byte[] byteData = new byte[nCols * nRows * nBytes];
for (int i = 0; i < nCols; i++)
{
    for (int j = 0; j < nRows; j++)
    {
        values[i, j] = random.NextDouble();
        valuesOneDim[i + nCols * j] = values[i, j];
        byte[] thisByteValue = BitConverter.GetBytes(values[i, j]);
        for (int k = 0; k < nBytes; k++)
        {
            byteData[k + nBytes * (i + nCols * j)] = thisByteValue[k];
        }
    }
}

// Write byte array to dataset

htri_t status = -1;

int arrayCols = (htri_t)(dims[1]);
int existingRows = (htri_t)(dims[0]);
int appendRows = data.GetLength(0) / Marshal.SizeOf(typeof(T)) /
arrayCols; // This number is 5, just like nRows.

htri_t nBytes = Marshal.SizeOf(typeof(T)) * arrayCols * appendRows; // =
5*30*8=1200

ulong[] appendDims = new ulong[] { (ulong)appendRows, (ulong)arrayCols };
// [5, 30]

hid_t memSpaceId = H5S.create_simple(2, appendDims, null);

ulong[] start = new ulong[2] { (ulong)existingRows, 0 }; // [0, 0]
ulong[] count = new ulong[2] { (ulong)appendRows, (ulong)arrayCols }; //
[5, 30]

dataspaceId = H5D.get_space(datasetId);
H5S.select_hyperslab(dataspaceId, H5S.seloper_t.SET, start, null, count,
null);

GCHandle handle = default(GCHandle);
try
{
    handle = GCHandle.Alloc(byteData, GCHandleType.Pinned);
    using (SafeArrayBuffer buffer = new
SafeArrayBuffer(Marshal.AllocHGlobal(nBytes)))
    {
        Marshal.Copy(byteData, 0, buffer.DangerousGetHandle(), nBytes);
        status = H5D.write(datasetId, H5T.NATIVE_DOUBLE, memSpaceId,
            dataspaceId, H5P.DEFAULT, buffer.DangerousGetHandle());
    }
}
finally
{
    handle.Free();
}

// Close dataspaces, datasets, types, etc (not included).

...

···

Hi Johan,

The second argument to H5Dwrite function (referring to C API, not C++)
sets the type of single element which is expected in data buffer. If you
set this to H5T_NATIVE_DOUBLE everything should be fine and data should
be properly written since data buffer is just a pointer (void*).

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write

Usually -1 status means that you messed something with memory space and
file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
Please send some of your code examples (including how dataset is
created) for further investigation...

Regards,
Rafal

W dniu 2017-10-18 o?14:21, Johan Lindberg pisze:
> Hi,
>
> Is there a way to write raw binary byte array data to an existing
> dataset of a different type? E.g., if I have a byte array that
> represents an array of doubles (the byte array thus has 8 times as many
> elements as the double array, where each set of 8 bytes represents a
> double), can I somehow write that data to a double dataset in an HDF5
> file? Trying this the na?ve way with HDF.write just returns a -1 status.
>
> The reason why I don't just convert it to a double array before writing
> is that I have an instrument which returns all its data in byte arrays,
> no matter the type, and then I'd have to write a converter for each of
> the 10 different types it output in.
>
> Thank you,
> Johan Lindberg
>
> --
> Dr. Johan E. Lindberg
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>

------------------------------

Subject: Digest Footer

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

------------------------------

End of Hdf-forum Digest, Vol 100, Issue 13
******************************************

--
Dr. Johan E. Lindberg
Mobile phone: +46 (0)76-209 14 13
e-mail: jolindbe@gmail.com

Hi Johan,

Everything in your code seems to be fine, but since your initial dataset X-dimension is "0" (no rows when creating dataset) you have to call:

H5D.set_extent(datasetId, count);

right after you defined a new space dimensions in "count".

See also my minimal working example in C++ (but using C API):

     int rank = 2;
     int nCols = 30;
     int nRows = 5;
     hsize_t tdims[rank] = {0, nCols};
     hsize_t maxdims[rank] = {H5S_UNLIMITED, nCols};
     hid_t space_2d = H5Screate_simple (rank, tdims, maxdims);
     hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
     H5Pset_layout(plist, H5D_CHUNKED);
     hsize_t chunk[rank] = {1, nCols};
     H5Pset_chunk(plist, rank, chunk);

     hid_t datasetId = H5Dcreate(file_id, "TestDataset", H5T_NATIVE_DOUBLE, space_2d, H5P_DEFAULT, plist, H5P_DEFAULT);

     hsize_t newdims[rank] = {nRows, nCols};
     H5Dset_extent(datasetId, newdims);
     hid_t newspace = H5Dget_space(datasetId);
     hsize_t offset[rank] = {0, 0};
     H5Sselect_hyperslab(newspace, H5S_SELECT_SET, offset, NULL, newdims, NULL);

     hid_t tmemspace = H5Screate_simple(rank, newdims, NULL);
     double data[nCols * nRows];
     double step = 1.1;
     for(int r = 0; r < nRows; r++)
     {
         for(int c = 0; c < nCols; c++)
         {
             data[c + r * nCols] = step;
             step += 1.1;
         }
     }

     uint8_t *data_bytes = reinterpret_cast<uint8_t*>(data);

     H5Dwrite (datasetId, H5T_NATIVE_DOUBLE, tmemspace, newspace, H5P_DEFAULT, data_bytes);

     H5Sclose(newspace);
     H5Sclose(tmemspace);
     H5Dclose(datasetId);
     H5Pclose(plist);
     H5Sclose(space_2d);

Best regards,
Rafal

W dniu 2017-10-19 o 09:33, Johan Lindberg pisze:

···

Hi Rafael,

Thank you for your reply!

I am using Visual Studio (C#) and HDF5 P/Invoke. While the syntax is a bit different from C/C++, the HDF5 functions should do the same.

I am indeed setting the type argument to the native double type. I paste my (simplified yet still quite lengthy) code below.

It is the status in the try block in the very end that returns -1. This works fine if I pass the double[] valuesOneDim to H5D.write instead of the byte[] byteData.

// Create a H5F file and H5G group (not included)
hid_t groupId = ...

// Create dataspace and dataset:
int nCols = 30;
int nRows = 5;
htri_t rank = 2;
ulong[] dims = new ulong[2] { 0, nCols };
ulong?[] maxDims = new ulong[2] { H5S.UNLIMITED, nCols };

string name = "TestDataset";

hid_t dataspaceId = H5S.create_simple(rank, dims, maxDims);
hid_t pList = H5P.create(H5P.DATASET_CREATE);
H5P.set_layout(pList, H5D.layout_t.CHUNKED);
H5P.set_chunk(pList, rank, new ulong[] { 1, (ulong)maxDims[1] });
hid_t datasetId = H5D.create(groupId, name, H5T.NATIVE_DOUBLE, dataspaceId,
H5P.DEFAULT, pList, H5P.DEFAULT);
H5P.close(pList);

// Generate a 2D (5x30) random double array and converting it to a 1D byte array.

Random random = new Random();
double[,] values = new double[nCols, nRows];
double[] valuesOneDim = new double[nCols * nRows];
int nBytes = 8;
byte[] byteData = new byte[nCols * nRows * nBytes];
for (int i = 0; i < nCols; i++)
{
for (int j = 0; j < nRows; j++)
{
values[i, j] = random.NextDouble();
valuesOneDim[i + nCols * j] = values[i, j];
byte[] thisByteValue = BitConverter.GetBytes(values[i, j]);
for (int k = 0; k < nBytes; k++)
{
byteData[k + nBytes * (i + nCols * j)] = thisByteValue[k];
}
}
}

// Write byte array to dataset

htri_t status = -1;

int arrayCols = (htri_t)(dims[1]);
int existingRows = (htri_t)(dims[0]);
int appendRows = data.GetLength(0) / Marshal.SizeOf(typeof(T)) / arrayCols; // This number is 5, just like nRows.

htri_t nBytes = Marshal.SizeOf(typeof(T)) * arrayCols * appendRows; // = 5*30*8=1200

ulong[] appendDims = new ulong[] { (ulong)appendRows, (ulong)arrayCols }; // [5, 30]

hid_t memSpaceId = H5S.create_simple(2, appendDims, null);

ulong[] start = new ulong[2] { (ulong)existingRows, 0 }; // [0, 0]
ulong[] count = new ulong[2] { (ulong)appendRows, (ulong)arrayCols }; // [5, 30]

dataspaceId = H5D.get_space(datasetId);
H5S.select_hyperslab(dataspaceId, H5S.seloper_t.SET, start, null, count, null);

GCHandle handle = default(GCHandle);
try
{
handle = GCHandle.Alloc(byteData, GCHandleType.Pinned);
using (SafeArrayBuffer buffer = new SafeArrayBuffer(Marshal.AllocHGlobal(nBytes)))
{
Marshal.Copy(byteData, 0, buffer.DangerousGetHandle(), nBytes);
status = H5D.write(datasetId, H5T.NATIVE_DOUBLE, memSpaceId,
dataspaceId, H5P.DEFAULT, buffer.DangerousGetHandle());
}
}
finally
{
handle.Free();
}

// Close dataspaces, datasets, types, etc (not included).

...

    Hi Johan,

    The second argument to H5Dwrite function (referring to C API, not C++)
    sets the type of single element which is expected in data buffer. If you
    set this to H5T_NATIVE_DOUBLE everything should be fine and data should
    be properly written since data buffer is just a pointer (void*).

    https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write
    <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write>

    Usually -1 status means that you messed something with memory space and
    file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
    Please send some of your code examples (including how dataset is
    created) for further investigation...

    Regards,
    Rafal

    W dniu 2017-10-18 o?14:21, Johan Lindberg pisze:
     > Hi,
     >
     > Is there a way to write raw binary byte array data to an existing
     > dataset of a different type? E.g., if I have a byte array that
     > represents an array of doubles (the byte array thus has 8 times
    as many
     > elements as the double array, where each set of 8 bytes represents a
     > double), can I somehow write that data to a double dataset in an HDF5
     > file? Trying this the na?ve way with HDF.write just returns a -1
    status.
     >
     > The reason why I don't just convert it to a double array before
    writing
     > is that I have an instrument which returns all its data in byte
    arrays,
     > no matter the type, and then I'd have to write a converter for
    each of
     > the 10 different types it output in.
     >
     > Thank you,
     > Johan Lindberg
     >
     > --
     > Dr. Johan E. Lindberg
     >
     > _______________________________________________
     > Hdf-forum is for HDF software users discussion.
     > Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
     >
    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
    <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>
     > Twitter: https://twitter.com/hdf5
     >

    ------------------------------

    Subject: Digest Footer

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
    <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>

    ------------------------------

    End of Hdf-forum Digest, Vol 100, Issue 13
    ******************************************

--
Dr. Johan E. Lindberg
Mobile phone: +46 (0)76-209 14 13
e-mail: jolindbe@gmail.com <mailto:jolindbe@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5