Adding data dynamically to a dataset in hdf java

Hi Peter

I'm trying to do it with the read chunk by chunk, but having trouble
creating the data set, in the example [1] it's done like this

H5.H5Dcreate(file_id, DATASETNAME,
                        HDF5Constants.H5T_STD_I32LE, dataspace_id, dcpl_id);

the type is for int, but I cant seem to find the correct one for string, in
example[2] with string arrays t looks like this,

H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                        dataspace_id, HDF5Constants.H5P_DEFAULT);

If I create the dataset like this when I want to dynamiccaly add I can only
get the first byte in each of the string. Any tips on what type I should
use?

Håkon

[1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

[2]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

···

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no, phone +47 55584125

Håkon,

There was a typo in my previous email. You do NOT need to read the first chunk in order
to write the second chunk. You can just select whatever chunks you want to write.

Sorry for the misleading.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter

I'm trying to do it with the read chunk by chunk, but having trouble creating the data set, in the example [1] it's done like this

H5.H5Dcreate(file_id, DATASETNAME,
                        HDF5Constants.H5T_STD_I32LE, dataspace_id, dcpl_id);

the type is for int, but I cant seem to find the correct one for string, in example[2] with string arrays t looks like this,

H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                        dataspace_id, HDF5Constants.H5P_DEFAULT);

If I create the dataset like this when I want to dynamiccaly add I can only get the first byte in each of the string. Any tips on what type I should use?

Håkon

[1] http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

[2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>, phone +47 55584125
------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Peter

My problem is actually before I can create the dataset the first time, I
can't figure out the correct data type to use. I guess I should use a byte
type, since the strins are converted to bytes

Håkon

···

On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org> wrote:

Håkon,

There was a typo in my previous email. You do NOT need to read the first
chunk in order
to write the second chunk. You can just select whatever chunks you want to
write.

Sorry for the misleading.

Thanks
--pc

Håkon Sagehaug wrote:

Hi Peter

I'm trying to do it with the read chunk by chunk, but having trouble
creating the data set, in the example [1] it's done like this

H5.H5Dcreate(file_id, DATASETNAME,
                       HDF5Constants.H5T_STD_I32LE, dataspace_id,
dcpl_id);

the type is for int, but I cant seem to find the correct one for string,
in example[2] with string arrays t looks like this,

H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                       dataspace_id, HDF5Constants.H5P_DEFAULT);

If I create the dataset like this when I want to dynamiccaly add I can
only get the first byte in each of the string. Any tips on what type I
should use?

Håkon

[1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

[2]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>, phone +47 55584125
------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Hakon,

I assume you are using 1D array of strings. Here are some hints for you:

1) You may just use string dataype. You can use variable length string if your strings have different size,
     or you can use fixed length string if your strings are about the same length, e.g.
            tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
            for fixed length of 128
            H5.H5Tset_size(128);
            for variable length
            H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
2) Set dataset creation property for chunking and compression
                plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set the chunk size to be about 2MB for best performance
                H5.H5Pset_deflate(plist, 5);

3) Set the dimension size, e.g.
            sid = H5.H5Screate_simple(1, new long[]{25000000}, new long[] {HDF5Constants.H5S_UNLIMITED});

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter

My problem is actually before I can create the dataset the first time, I can't figure out the correct data type to use. I guess I should use a byte type, since the strins are converted to bytes

Håkon

On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Håkon,

    There was a typo in my previous email. You do NOT need to read the
    first chunk in order
    to write the second chunk. You can just select whatever chunks you
    want to write.

    Sorry for the misleading.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi Peter

        I'm trying to do it with the read chunk by chunk, but having
        trouble creating the data set, in the example [1] it's done
        like this

        H5.H5Dcreate(file_id, DATASETNAME,
                               HDF5Constants.H5T_STD_I32LE,
        dataspace_id, dcpl_id);

        the type is for int, but I cant seem to find the correct one
        for string, in example[2] with string arrays t looks like this,

         H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                               dataspace_id, HDF5Constants.H5P_DEFAULT);

        If I create the dataset like this when I want to dynamiccaly
        add I can only get the first byte in each of the string. Any
        tips on what type I should use?

        Håkon

        [1]
        http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

        [2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

        -- Håkon Sagehaug, Scientific Programmer
        Parallab, Uni BCCS/Uni Research
        Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
        <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>,
        phone +47 55584125
        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi

So I tried, but not sure it worked. I cant figure out how to connect the
dataset to a file, so I can view it in hdf-viewer. Here is my method for
writing a an array of Strings to a dataset.

private static void writeUn() {
        int file_id = -1;
        int dcpl_id = -1;
        int dataspace_id = -1;
        int dataset_id = -1;
        int memtype_id = -1;
        int filetype_id = -1;
        int plist = -1;

        long[] dims = { DIM_X };
        long[] chunk_dims = { CHUNK_X };
        long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
        byte[][] dset_data = new byte[DIM_X][SDIM];
        StringBuffer[] str_data = new StringBuffer[DIM_X];

        // Initialize the dataset.
        for (int indx = 0; indx < DIM_X; indx++)
            str_data[indx] = new StringBuffer(String.valueOf("iteration "
                    + (indx + 1)));

        // Create a new file using default properties.
        try {
            file_id = H5.H5Fcreate(FILENAME_A, HDF5Constants.H5F_ACC_TRUNC,
                    HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
        } catch (Exception e) {
            e.printStackTrace();
        }

        try {
            filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
            H5.H5Tset_size(filetype_id, SDIM);

            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
            H5.H5Pset_chunk(plist, 1, new long[] { 1024 });

            H5.H5Pset_deflate(plist, 5);

             dataset_id = H5.H5Screate_simple(1, new long[] { 25000000 },
new
             long[] { HDF5Constants.H5S_UNLIMITED });

        } catch (Exception e) {
            e.printStackTrace();
        }

        // Write the data to the dataset.
        try {
            for (int indx = 0; indx < DIM_X; indx++) {
                for (int jndx = 0; jndx < SDIM; jndx++) {
                    if (jndx < str_data[indx].length())
                        dset_data[indx][jndx] = (byte) str_data[indx]
                                .charAt(jndx);
                    else
                        dset_data[indx][jndx] = 0;
                }
            }
            if ((dataset_id >= 0) && (memtype_id >= 0))
                H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, dset_data);
        } catch (Exception e) {
            e.printStackTrace();
        }
}

So my question, is this correct way of doing it and how do I connect the
dataset to the file. I guess it's time of creating the dataset.

After this is the way forward, like this

1. H5.H5Dextend(dataset_id, extdims);
2. dataspace_id = H5.H5Dget_space(dataset_id);
3. H5.H5Sselect_all(dataspace_id);
   // Subtract a hyperslab reflecting the original dimensions from
   // the
   // selection. The selection now contains only the newly extended
   // portions of the dataset.
   count[0] = dims[0];
   count[1] = dims[1];
   H5.H5Sselect_hyperslab(dataspace_id,
                                HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);

   // Write the data to the selected portion of the dataset.
   if (dataset_id >= 0)
         H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
              HDF5Constants.H5S_ALL, dataspace_id,
              HDF5Constants.H5P_DEFAULT, extend_dset_data);

I also see that H5.H5Dextend is depricated, from version 1.8, is there
another method to user?

cheers, Håkon

···

On 19 March 2010 16:05, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Hakon,

I assume you are using 1D array of strings. Here are some hints for you:

1) You may just use string dataype. You can use variable length string if
your strings have different size,
   or you can use fixed length string if your strings are about the same
length, e.g.
          tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
          for fixed length of 128
          H5.H5Tset_size(128);
          for variable length
          H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
2) Set dataset creation property for chunking and compression
              plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
              H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
              H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set the
chunk size to be about 2MB for best performance
              H5.H5Pset_deflate(plist, 5);

3) Set the dimension size, e.g.
          sid = H5.H5Screate_simple(1, new long[]{25000000}, new long[]
{HDF5Constants.H5S_UNLIMITED});

Thanks
--pc

Håkon Sagehaug wrote:

Hi Peter

My problem is actually before I can create the dataset the first time, I
can't figure out the correct data type to use. I guess I should use a byte
type, since the strins are converted to bytes

Håkon

On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   Håkon,

   There was a typo in my previous email. You do NOT need to read the
   first chunk in order
   to write the second chunk. You can just select whatever chunks you
   want to write.

   Sorry for the misleading.

   Thanks
   --pc

   Håkon Sagehaug wrote:

       Hi Peter

       I'm trying to do it with the read chunk by chunk, but having
       trouble creating the data set, in the example [1] it's done
       like this

       H5.H5Dcreate(file_id, DATASETNAME,
                              HDF5Constants.H5T_STD_I32LE,
       dataspace_id, dcpl_id);

       the type is for int, but I cant seem to find the correct one
       for string, in example[2] with string arrays t looks like this,

        H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                              dataspace_id, HDF5Constants.H5P_DEFAULT);

       If I create the dataset like this when I want to dynamiccaly
       add I can only get the first byte in each of the string. Any
       tips on what type I should use?

       Håkon

       [1]

http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

       [2]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

       -- Håkon Sagehaug, Scientific Programmer
       Parallab, Uni BCCS/Uni Research
       Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
       <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>,

       phone +47 55584125

------------------------------------------------------------------------

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>

       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>

   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

A minor change to the writeUp(). For testing purpose, using String[] array instead of StringBuffer and
direct pass the string array to H5Dwrite(). Currently, H5Dwrite() in hdf-java does not handle 2D array
because of performance issue. For your case, you do not need to call H5DWrite() at writeUp() since
you are going to write data part by part later.

H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8. The new HDF5 1.8 APIs are
not supported in hdf-java. We are still working on this. For now, just use H5Dextend().

When you create the dataset, you already have the space for 25M strings (i.e. new long[] { 25000000 }).
Do you want to extend your data space more that? If not, you do not need to call H5Dextend(). Just
select chunks you want to write.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi

So I tried, but not sure it worked. I cant figure out how to connect the dataset to a file, so I can view it in hdf-viewer. Here is my method for writing a an array of Strings to a dataset.

private static void writeUn() {
        int file_id = -1;
        int dcpl_id = -1;
        int dataspace_id = -1;
        int dataset_id = -1;
        int memtype_id = -1;
        int filetype_id = -1;
        int plist = -1;

        long[] dims = { DIM_X };
        long[] chunk_dims = { CHUNK_X };
        long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
        byte[][] dset_data = new byte[DIM_X][SDIM];
        StringBuffer[] str_data = new StringBuffer[DIM_X];

        // Initialize the dataset.
        for (int indx = 0; indx < DIM_X; indx++)
            str_data[indx] = new StringBuffer(String.valueOf("iteration "
                    + (indx + 1)));

        // Create a new file using default properties.
        try {
            file_id = H5.H5Fcreate(FILENAME_A, HDF5Constants.H5F_ACC_TRUNC,
                    HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
        } catch (Exception e) {
            e.printStackTrace();
        }

        try {
            filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
            H5.H5Tset_size(filetype_id, SDIM);

            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
            H5.H5Pset_chunk(plist, 1, new long[] { 1024 });

            H5.H5Pset_deflate(plist, 5);

                       dataset_id = H5.H5Screate_simple(1, new long[] { 25000000 }, new
             long[] { HDF5Constants.H5S_UNLIMITED });
                   } catch (Exception e) {
            e.printStackTrace();
        }

        // Write the data to the dataset.
        try {
            for (int indx = 0; indx < DIM_X; indx++) {
                for (int jndx = 0; jndx < SDIM; jndx++) {
                    if (jndx < str_data[indx].length())
                        dset_data[indx][jndx] = (byte) str_data[indx]
                                .charAt(jndx);
                    else
                        dset_data[indx][jndx] = 0;
                }
            }
            if ((dataset_id >= 0) && (memtype_id >= 0))
                H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, dset_data);
        } catch (Exception e) {
            e.printStackTrace();
        }
}

So my question, is this correct way of doing it and how do I connect the dataset to the file. I guess it's time of creating the dataset.

After this is the way forward, like this

1. H5.H5Dextend(dataset_id, extdims);
2. dataspace_id = H5.H5Dget_space(dataset_id);
3. H5.H5Sselect_all(dataspace_id);
   // Subtract a hyperslab reflecting the original dimensions from
   // the
   // selection. The selection now contains only the newly extended
   // portions of the dataset.
   count[0] = dims[0];
   count[1] = dims[1];
   H5.H5Sselect_hyperslab(dataspace_id,
                                HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);

   // Write the data to the selected portion of the dataset.
   if (dataset_id >= 0)
         H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
              HDF5Constants.H5S_ALL, dataspace_id,
              HDF5Constants.H5P_DEFAULT, extend_dset_data);

I also see that H5.H5Dextend is depricated, from version 1.8, is there another method to user?

cheers, Håkon
On 19 March 2010 16:05, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Hakon,

    I assume you are using 1D array of strings. Here are some hints
    for you:

    1) You may just use string dataype. You can use variable length
    string if your strings have different size,
       or you can use fixed length string if your strings are about
    the same length, e.g.
              tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
              for fixed length of 128
              H5.H5Tset_size(128);
              for variable length
              H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
    2) Set dataset creation property for chunking and compression
                  plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                  H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                  H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set
    the chunk size to be about 2MB for best performance
                  H5.H5Pset_deflate(plist, 5);

    3) Set the dimension size, e.g.
              sid = H5.H5Screate_simple(1, new long[]{25000000}, new
    long[] {HDF5Constants.H5S_UNLIMITED});

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi Peter

        My problem is actually before I can create the dataset the
        first time, I can't figure out the correct data type to use. I
        guess I should use a byte type, since the strins are converted
        to bytes

        Håkon

        On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           Håkon,

           There was a typo in my previous email. You do NOT need to
        read the
           first chunk in order
           to write the second chunk. You can just select whatever
        chunks you
           want to write.

           Sorry for the misleading.

           Thanks
           --pc

           Håkon Sagehaug wrote:

               Hi Peter

               I'm trying to do it with the read chunk by chunk, but
        having
               trouble creating the data set, in the example [1] it's done
               like this

               H5.H5Dcreate(file_id, DATASETNAME,
                                      HDF5Constants.H5T_STD_I32LE,
               dataspace_id, dcpl_id);

               the type is for int, but I cant seem to find the
        correct one
               for string, in example[2] with string arrays t looks
        like this,

                H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                                      dataspace_id,
        HDF5Constants.H5P_DEFAULT);

               If I create the dataset like this when I want to
        dynamiccaly
               add I can only get the first byte in each of the
        string. Any
               tips on what type I should use?

               Håkon

               [1]
                      http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

                      [2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

               -- Håkon Sagehaug, Scientific Programmer
               Parallab, Uni BCCS/Uni Research
               Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
        <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>,

               phone +47 55584125
                      ------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

                      http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
               
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Peter,

Thanks for the reply, sorry if I'm asking alot of questions, but I still
can't figure out how to connect the write of a dataset to a file.
We want to give the hdf file to a another program that does some analysis
for us. I guess my psudo code looks something like this.

Assume that the file has 25M lines

List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
    lineEntry.add(line)
    if(index % maxRecords == 0){
       String [] valuesToWrite = lineEntry.toArray();

       //Find out where in the dataset to start the writing from, using
hyperslab
       int dataspace_id = H5.H5Dget_space(dataset_id);
       H5.H5Sselect_all(dataspace_id);
       //2'end iteration the start index for writing is 1500000*2
      //So the writing will start from 3 M -> 4.5M
      H5
                        .H5Sselect_hyperslab(dataspace_id,
                                HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);
       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, valuesToWrite);
        lineEntry.clear();

     }
    index++
}

Does this look correct? Also I don't need to extend the dataset if I can
allocate 25M entries, as long as I dont have to keep them all in memory at
the same time.

cheers, Håkon

···

On 22 March 2010 17:01, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Håkon,

A minor change to the writeUp(). For testing purpose, using String[] array
instead of StringBuffer and
direct pass the string array to H5Dwrite(). Currently, H5Dwrite() in
hdf-java does not handle 2D array
because of performance issue. For your case, you do not need to call
H5DWrite() at writeUp() since
you are going to write data part by part later.

H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8. The new
HDF5 1.8 APIs are
not supported in hdf-java. We are still working on this. For now, just use
H5Dextend().

When you create the dataset, you already have the space for 25M strings
(i.e. new long[] { 25000000 }).
Do you want to extend your data space more that? If not, you do not need to
call H5Dextend(). Just
select chunks you want to write.

Thanks
--pc

Håkon Sagehaug wrote:

Hi

So I tried, but not sure it worked. I cant figure out how to connect the
dataset to a file, so I can view it in hdf-viewer. Here is my method for
writing a an array of Strings to a dataset.

private static void writeUn() {
       int file_id = -1;
       int dcpl_id = -1;
       int dataspace_id = -1;
       int dataset_id = -1;
       int memtype_id = -1;
       int filetype_id = -1;
       int plist = -1;

       long[] dims = { DIM_X };
       long[] chunk_dims = { CHUNK_X };
       long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
       byte[][] dset_data = new byte[DIM_X][SDIM];
       StringBuffer[] str_data = new StringBuffer[DIM_X];

       // Initialize the dataset.
       for (int indx = 0; indx < DIM_X; indx++)
           str_data[indx] = new StringBuffer(String.valueOf("iteration "
                   + (indx + 1)));

       // Create a new file using default properties.
       try {
           file_id = H5.H5Fcreate(FILENAME_A, HDF5Constants.H5F_ACC_TRUNC,
                   HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
       } catch (Exception e) {
           e.printStackTrace();
       }

       try {
           filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
           H5.H5Tset_size(filetype_id, SDIM);

           plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
           H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
           H5.H5Pset_chunk(plist, 1, new long[] { 1024 });

           H5.H5Pset_deflate(plist, 5);

                      dataset_id = H5.H5Screate_simple(1, new long[] {
25000000 }, new
            long[] { HDF5Constants.H5S_UNLIMITED });
                  } catch (Exception e) {
           e.printStackTrace();
       }

       // Write the data to the dataset.
       try {
           for (int indx = 0; indx < DIM_X; indx++) {
               for (int jndx = 0; jndx < SDIM; jndx++) {
                   if (jndx < str_data[indx].length())
                       dset_data[indx][jndx] = (byte) str_data[indx]
                               .charAt(jndx);
                   else
                       dset_data[indx][jndx] = 0;
               }
           }
           if ((dataset_id >= 0) && (memtype_id >= 0))
               H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                       HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                       HDF5Constants.H5P_DEFAULT, dset_data);
       } catch (Exception e) {
           e.printStackTrace();
       }
}

So my question, is this correct way of doing it and how do I connect the
dataset to the file. I guess it's time of creating the dataset.

After this is the way forward, like this

1. H5.H5Dextend(dataset_id, extdims);
2. dataspace_id = H5.H5Dget_space(dataset_id);
3. H5.H5Sselect_all(dataspace_id);
  // Subtract a hyperslab reflecting the original dimensions from
  // the
  // selection. The selection now contains only the newly extended
  // portions of the dataset.
  count[0] = dims[0];
  count[1] = dims[1];
  H5.H5Sselect_hyperslab(dataspace_id,
                               HDF5Constants.H5S_SELECT_NOTB, start, null,
                               count, null);

  // Write the data to the selected portion of the dataset.
  if (dataset_id >= 0)
        H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
             HDF5Constants.H5S_ALL, dataspace_id,
             HDF5Constants.H5P_DEFAULT, extend_dset_data);

I also see that H5.H5Dextend is depricated, from version 1.8, is there
another method to user?

cheers, Håkon
On 19 March 2010 16:05, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   Hi Hakon,

   I assume you are using 1D array of strings. Here are some hints
   for you:

   1) You may just use string dataype. You can use variable length
   string if your strings have different size,
      or you can use fixed length string if your strings are about
   the same length, e.g.
             tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
             for fixed length of 128
             H5.H5Tset_size(128);
             for variable length
             H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
   2) Set dataset creation property for chunking and compression
                 plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                 H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                 H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set
   the chunk size to be about 2MB for best performance
                 H5.H5Pset_deflate(plist, 5);

   3) Set the dimension size, e.g.
             sid = H5.H5Screate_simple(1, new long[]{25000000}, new
   long[] {HDF5Constants.H5S_UNLIMITED});

   Thanks
   --pc

   Håkon Sagehaug wrote:

       Hi Peter

       My problem is actually before I can create the dataset the
       first time, I can't figure out the correct data type to use. I
       guess I should use a byte type, since the strins are converted
       to bytes

       Håkon

       On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org >> >> <mailto:xcao@hdfgroup.org>>> wrote:

          Håkon,

          There was a typo in my previous email. You do NOT need to
       read the
          first chunk in order
          to write the second chunk. You can just select whatever
       chunks you
          want to write.

          Sorry for the misleading.

          Thanks
          --pc

          Håkon Sagehaug wrote:

              Hi Peter

              I'm trying to do it with the read chunk by chunk, but
       having
              trouble creating the data set, in the example [1] it's done
              like this

              H5.H5Dcreate(file_id, DATASETNAME,
                                     HDF5Constants.H5T_STD_I32LE,
              dataspace_id, dcpl_id);

              the type is for int, but I cant seem to find the
       correct one
              for string, in example[2] with string arrays t looks
       like this,

               H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                                     dataspace_id,
       HDF5Constants.H5P_DEFAULT);

              If I create the dataset like this when I want to
       dynamiccaly
              add I can only get the first byte in each of the
       string. Any
              tips on what type I should use?

              Håkon

              [1]

http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

                     [2]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

              -- Håkon Sagehaug, Scientific Programmer
              Parallab, Uni BCCS/Uni Research
              Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
       <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>
              <mailto:Hakon.Sagehaug@uni.no
       <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
       <mailto:Hakon.Sagehaug@uni.no>>>,

              phone +47 55584125

------------------------------------------------------------------------

              _______________________________________________
              Hdf-forum is for HDF software users discussion.
              Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

          _______________________________________________
          Hdf-forum is for HDF software users discussion.
          Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

From you psudo code, I can see you are writing 1.5M lines a time.

Basically your program is right. One minor change at H5Dwrite():

       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1, msid, fsid,
                        HDF5Constants.H5P_DEFAULT, valuesToWrite);

Where msid is the memory space, you can get it from H5Screate_simple().
fsid is the file space, which you can get it from H5Dget_space(). You need
pass fisd to H5Sselect_hyperslab() for selecting the part to write.

(If I can find time on the weekend, I will write a simple program based on
your psudo code.)

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter,

Thanks for the reply, sorry if I'm asking alot of questions, but I still can't figure out how to connect the write of a dataset to a file.
We want to give the hdf file to a another program that does some analysis for us. I guess my psudo code looks something like this.

Assume that the file has 25M lines

List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
    lineEntry.add(line)
    if(index % maxRecords == 0){
       String [] valuesToWrite = lineEntry.toArray();
        //Find out where in the dataset to start the writing from, using hyperslab
       int dataspace_id = H5.H5Dget_space(dataset_id);
       H5.H5Sselect_all(dataspace_id);
       //2'end iteration the start index for writing is 1500000*2
      //So the writing will start from 3 M -> 4.5M
      H5
                        .H5Sselect_hyperslab(dataspace_id,
                                HDF5Constants.H5S_SELECT_NOTB, start, null,
                                count, null);
       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, valuesToWrite);
        lineEntry.clear();

     }
    index++
}

Does this look correct? Also I don't need to extend the dataset if I can allocate 25M entries, as long as I dont have to keep them all in memory at the same time.

cheers, Håkon
On 22 March 2010 17:01, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Håkon,

    A minor change to the writeUp(). For testing purpose, using
    String[] array instead of StringBuffer and
    direct pass the string array to H5Dwrite(). Currently, H5Dwrite()
    in hdf-java does not handle 2D array
    because of performance issue. For your case, you do not need to
    call H5DWrite() at writeUp() since
    you are going to write data part by part later.

    H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8.
    The new HDF5 1.8 APIs are
    not supported in hdf-java. We are still working on this. For now,
    just use H5Dextend().

    When you create the dataset, you already have the space for 25M
    strings (i.e. new long[] { 25000000 }).
    Do you want to extend your data space more that? If not, you do
    not need to call H5Dextend(). Just
    select chunks you want to write.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi

        So I tried, but not sure it worked. I cant figure out how to
        connect the dataset to a file, so I can view it in hdf-viewer.
        Here is my method for writing a an array of Strings to a dataset.

        private static void writeUn() {
               int file_id = -1;
               int dcpl_id = -1;
               int dataspace_id = -1;
               int dataset_id = -1;
               int memtype_id = -1;
               int filetype_id = -1;
               int plist = -1;

               long[] dims = { DIM_X };
               long[] chunk_dims = { CHUNK_X };
               long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
               byte[][] dset_data = new byte[DIM_X][SDIM];
               StringBuffer[] str_data = new StringBuffer[DIM_X];

               // Initialize the dataset.
               for (int indx = 0; indx < DIM_X; indx++)
                   str_data[indx] = new
        StringBuffer(String.valueOf("iteration "
                           + (indx + 1)));

               // Create a new file using default properties.
               try {
                   file_id = H5.H5Fcreate(FILENAME_A,
        HDF5Constants.H5F_ACC_TRUNC,
                           HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);
               } catch (Exception e) {
                   e.printStackTrace();
               }

               try {
                   filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                   H5.H5Tset_size(filetype_id, SDIM);

                   plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                   H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                   H5.H5Pset_chunk(plist, 1, new long[] { 1024 });

                   H5.H5Pset_deflate(plist, 5);

                              dataset_id = H5.H5Screate_simple(1, new
        long[] { 25000000 }, new
                    long[] { HDF5Constants.H5S_UNLIMITED });
                          } catch (Exception e) {
                   e.printStackTrace();
               }

               // Write the data to the dataset.
               try {
                   for (int indx = 0; indx < DIM_X; indx++) {
                       for (int jndx = 0; jndx < SDIM; jndx++) {
                           if (jndx < str_data[indx].length())
                               dset_data[indx][jndx] = (byte)
        str_data[indx]
                                       .charAt(jndx);
                           else
                               dset_data[indx][jndx] = 0;
                       }
                   }
                   if ((dataset_id >= 0) && (memtype_id >= 0))
                       H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
                               HDF5Constants.H5S_ALL,
        HDF5Constants.H5S_ALL,
                               HDF5Constants.H5P_DEFAULT, dset_data);
               } catch (Exception e) {
                   e.printStackTrace();
               }
        }

        So my question, is this correct way of doing it and how do I
        connect the dataset to the file. I guess it's time of creating
        the dataset.

        After this is the way forward, like this

        1. H5.H5Dextend(dataset_id, extdims);
        2. dataspace_id = H5.H5Dget_space(dataset_id);
        3. H5.H5Sselect_all(dataspace_id);
          // Subtract a hyperslab reflecting the original dimensions from
          // the
          // selection. The selection now contains only the newly extended
          // portions of the dataset.
          count[0] = dims[0];
          count[1] = dims[1];
          H5.H5Sselect_hyperslab(dataspace_id,
                                       HDF5Constants.H5S_SELECT_NOTB,
        start, null,
                                       count, null);

          // Write the data to the selected portion of the dataset.
          if (dataset_id >= 0)
                H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
                     HDF5Constants.H5S_ALL, dataspace_id,
                     HDF5Constants.H5P_DEFAULT, extend_dset_data);

        I also see that H5.H5Dextend is depricated, from version 1.8,
        is there another method to user?

        cheers, Håkon
        On 19 March 2010 16:05, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           Hi Hakon,

           I assume you are using 1D array of strings. Here are some hints
           for you:

           1) You may just use string dataype. You can use variable length
           string if your strings have different size,
              or you can use fixed length string if your strings are about
           the same length, e.g.
                     tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                     for fixed length of 128
                     H5.H5Tset_size(128);
                     for variable length
                     H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
           2) Set dataset creation property for chunking and compression
                         plist =
        H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                         H5.H5Pset_layout(plist,
        HDF5Constants.H5D_CHUNKED);
                         H5.H5Pset_chunk(plist, 1, new long[] {1024});
        // set
           the chunk size to be about 2MB for best performance
                         H5.H5Pset_deflate(plist, 5);

           3) Set the dimension size, e.g.
                     sid = H5.H5Screate_simple(1, new
        long[]{25000000}, new
           long[] {HDF5Constants.H5S_UNLIMITED});

           Thanks
           --pc

           Håkon Sagehaug wrote:

               Hi Peter

               My problem is actually before I can create the dataset the
               first time, I can't figure out the correct data type to
        use. I
               guess I should use a byte type, since the strins are
        converted
               to bytes

               Håkon

               On 19 March 2010 15:29, Peter Cao <xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>

               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
        wrote:

                  Håkon,

                  There was a typo in my previous email. You do NOT
        need to
               read the
                  first chunk in order
                  to write the second chunk. You can just select whatever
               chunks you
                  want to write.

                  Sorry for the misleading.

                  Thanks
                  --pc

                  Håkon Sagehaug wrote:

                      Hi Peter

                      I'm trying to do it with the read chunk by
        chunk, but
               having
                      trouble creating the data set, in the example
        [1] it's done
                      like this

                      H5.H5Dcreate(file_id, DATASETNAME,
                                             HDF5Constants.H5T_STD_I32LE,
                      dataspace_id, dcpl_id);

                      the type is for int, but I cant seem to find the
               correct one
                      for string, in example[2] with string arrays t looks
               like this,

                       H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
                                             dataspace_id,
               HDF5Constants.H5P_DEFAULT);

                      If I create the dataset like this when I want to
               dynamiccaly
                      add I can only get the first byte in each of the
               string. Any
                      tips on what type I should use?

                      Håkon

                      [1]
                                    http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

                                    [2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java

                      -- Håkon Sagehaug, Scientific Programmer
                      Parallab, Uni BCCS/Uni Research
                      Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>
                      <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>>,

                      phone +47 55584125
                                    ------------------------------------------------------------------------

                      _______________________________________________
                      Hdf-forum is for HDF software users discussion.
                      Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>

                                    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
                                       _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>

                                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                      ------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                      http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
               
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

Below is the program that you can start with. I am using variable length strings.
For fixed length strings, there are some extra work. You may have to make the
strings to the same length.

You may try different chunk sizes and block sizes to have the best performance.

···

=======================
import ncsa.hdf.hdf5lib.H5;
import ncsa.hdf.hdf5lib.HDF5Constants;
import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

public class CreateStrings {

    private final static String H5_FILE = "G:\\temp\\strings.h5";
    private final static String DNAME = "/strs";
    private final static int RANK = 1;
    private final static long[] DIMS = { 25000000 };
    private final static long[] MAX_DIMS = { HDF5Constants.H5S_UNLIMITED };
    private final static long[] CHUNK_SIZE = { 25000 };
    private final static int BLOCK_SIZE = 250000;

    private void createDataset(int fid) throws Exception {
        int did = -1, tid = -1, sid = -1, plist = -1;

        try {
            tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
            // use variable length to save space
            H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
            sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

            // figure out creation properties
            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
            H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

            did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
        } finally {
            try {
                H5.H5Pclose(plist);
            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Sclose(sid);
            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Dclose(did);
            } catch (HDF5Exception ex) {
            }
        }
    }

    private void writeData(int fid) throws Exception {
        int did = -1, tid = -1, msid = -1, fsid = -1;
        long[] count = { BLOCK_SIZE };

        try {
            did = H5.H5Dopen(fid, DNAME);
            tid = H5.H5Dget_type(did);
            fsid = H5.H5Dget_space(did);
            msid = H5.H5Screate_simple(RANK, count, null);
            String[] strs = new String[BLOCK_SIZE];

            int idx = 0, block_indx = 0, start_idx = 0;
            long t0 = 0, t1 = 0;
            t0 = System.currentTimeMillis();
            System.out.println("Total number of blocks = "
                    + (DIMS[0] / BLOCK_SIZE));
            for (int i = 0; i < DIMS[0]; i++) {
                strs[idx++] = "str" + i;
                if (idx == BLOCK_SIZE) { // operator % is very expensive
                    idx = 0;
                    H5.H5Sselect_hyperslab(fsid, HDF5Constants.H5S_SELECT_SET,
                            new long[] { start_idx }, null, count, null);
                    H5.H5Dwrite(did, tid, msid, fsid,
                            HDF5Constants.H5P_DEFAULT, strs);

                    if (block_indx == 10) {
                        t1 = System.currentTimeMillis();
                        System.out.println("Total time (minutes) = "
                                + ((t1 - t0) * (DIMS[0] / BLOCK_SIZE)) / 1000
                                / 600);
                    }

                    block_indx++;
                    start_idx = i;
                }

            }

        } finally {
            try {
                H5.H5Sclose(fsid);
            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Sclose(msid);
            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Dclose(did);
            } catch (HDF5Exception ex) {
            }
        }
    }

    private void createFile() throws Exception {
        int fid = -1;

        fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
                HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

        if (fid < 0)
            return;

        try {
            createDataset(fid);
            writeData(fid);
        } finally {
            H5.H5Fclose(fid);
        }
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        try {
            (new CreateStrings()).createFile();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

}

Hi Peter,

Thanks so much for the code, seems to work very well, the only thing I found
was that when the index for next index to write in the hdf array, I had to
add 1 to it, so instead of

    start_idx = i;

I now have

    start_idx = i + 1;

cheers, Håkon

···

On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Håkon,

Below is the program that you can start with. I am using variable length
strings.
For fixed length strings, there are some extra work. You may have to make
the
strings to the same length.

You may try different chunk sizes and block sizes to have the best
performance.

=======================
import ncsa.hdf.hdf5lib.H5;
import ncsa.hdf.hdf5lib.HDF5Constants;
import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

public class CreateStrings {

  private final static String H5_FILE = "G:\\temp\\strings.h5";
  private final static String DNAME = "/strs";
  private final static int RANK = 1;
  private final static long[] DIMS = { 25000000 };
  private final static long[] MAX_DIMS = { HDF5Constants.H5S_UNLIMITED };
  private final static long[] CHUNK_SIZE = { 25000 };
  private final static int BLOCK_SIZE = 250000;

  private void createDataset(int fid) throws Exception {
      int did = -1, tid = -1, sid = -1, plist = -1;

      try {

          tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
          // use variable length to save space
          H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
          sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

          // figure out creation properties

          plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
          H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
          H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

          did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
      } finally {
          try {
              H5.H5Pclose(plist);
          } catch (HDF5Exception ex) {
          }
          try {
              H5.H5Sclose(sid);
          } catch (HDF5Exception ex) {
          }
          try {
              H5.H5Dclose(did);
          } catch (HDF5Exception ex) {
          }
      }
  }

  private void writeData(int fid) throws Exception {
      int did = -1, tid = -1, msid = -1, fsid = -1;
      long[] count = { BLOCK_SIZE };

      try {
          did = H5.H5Dopen(fid, DNAME);
          tid = H5.H5Dget_type(did);
          fsid = H5.H5Dget_space(did);
          msid = H5.H5Screate_simple(RANK, count, null);
          String[] strs = new String[BLOCK_SIZE];

          int idx = 0, block_indx = 0, start_idx = 0;
          long t0 = 0, t1 = 0;
          t0 = System.currentTimeMillis();
          System.out.println("Total number of blocks = "
                  + (DIMS[0] / BLOCK_SIZE));
          for (int i = 0; i < DIMS[0]; i++) {
              strs[idx++] = "str" + i;
              if (idx == BLOCK_SIZE) { // operator % is very expensive
                  idx = 0;
                  H5.H5Sselect_hyperslab(fsid,
HDF5Constants.H5S_SELECT_SET,
                          new long[] { start_idx }, null, count, null);
                  H5.H5Dwrite(did, tid, msid, fsid,
                          HDF5Constants.H5P_DEFAULT, strs);

                  if (block_indx == 10) {
                      t1 = System.currentTimeMillis();
                      System.out.println("Total time (minutes) = "
                              + ((t1 - t0) * (DIMS[0] / BLOCK_SIZE)) / 1000
                              / 600);
                  }

                  block_indx++;
                  start_idx = i;
              }

          }

      } finally {
          try {
              H5.H5Sclose(fsid);
          } catch (HDF5Exception ex) {
          }
          try {
              H5.H5Sclose(msid);
          } catch (HDF5Exception ex) {
          }
          try {
              H5.H5Dclose(did);
          } catch (HDF5Exception ex) {
          }
      }
  }

  private void createFile() throws Exception {
      int fid = -1;

      fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,

              HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

      if (fid < 0)
          return;

      try {
          createDataset(fid);
          writeData(fid);
      } finally {
          H5.H5Fclose(fid);
      }
  }

  /**
   * @param args
   */
  public static void main(String[] args) {
      try {
          (new CreateStrings()).createFile();
      } catch (Exception ex) {
          ex.printStackTrace();
      }
  }

}

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

Glad to know it work for you. Also you need to take care of the case that
the last block does not have the size of BLOCK_SIZE. This will happen
if the total size (25M) is not divided by BLOCK_SIZE. For better performance,
make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter,

Thanks so much for the code, seems to work very well, the only thing I found was that when the index for next index to write in the hdf array, I had to add 1 to it, so instead of

    start_idx = i;

I now have

    start_idx = i + 1;

cheers, Håkon

On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Håkon,

    Below is the program that you can start with. I am using variable
    length strings.
    For fixed length strings, there are some extra work. You may have
    to make the
    strings to the same length.

    You may try different chunk sizes and block sizes to have the best
    performance.

    =======================
    import ncsa.hdf.hdf5lib.H5;
    import ncsa.hdf.hdf5lib.HDF5Constants;
    import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

    public class CreateStrings {

      private final static String H5_FILE = "G:\\temp\\strings.h5";
      private final static String DNAME = "/strs";
      private final static int RANK = 1;
      private final static long[] DIMS = { 25000000 };
      private final static long[] MAX_DIMS = {
    HDF5Constants.H5S_UNLIMITED };
      private final static long[] CHUNK_SIZE = { 25000 };
      private final static int BLOCK_SIZE = 250000;

      private void createDataset(int fid) throws Exception {
          int did = -1, tid = -1, sid = -1, plist = -1;

          try {

              tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
              // use variable length to save space
              H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
              sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

              // figure out creation properties

              plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
              H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
              H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

              did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
          } finally {
              try {
                  H5.H5Pclose(plist);
              } catch (HDF5Exception ex) {
              }
              try {
                  H5.H5Sclose(sid);
              } catch (HDF5Exception ex) {
              }
              try {
                  H5.H5Dclose(did);
              } catch (HDF5Exception ex) {
              }
          }
      }

      private void writeData(int fid) throws Exception {
          int did = -1, tid = -1, msid = -1, fsid = -1;
          long[] count = { BLOCK_SIZE };

          try {
              did = H5.H5Dopen(fid, DNAME);
              tid = H5.H5Dget_type(did);
              fsid = H5.H5Dget_space(did);
              msid = H5.H5Screate_simple(RANK, count, null);
              String[] strs = new String[BLOCK_SIZE];

              int idx = 0, block_indx = 0, start_idx = 0;
              long t0 = 0, t1 = 0;
              t0 = System.currentTimeMillis();
              System.out.println("Total number of blocks = "
                      + (DIMS[0] / BLOCK_SIZE));
              for (int i = 0; i < DIMS[0]; i++) {
                  strs[idx++] = "str" + i;
                  if (idx == BLOCK_SIZE) { // operator % is very expensive
                      idx = 0;
                      H5.H5Sselect_hyperslab(fsid,
    HDF5Constants.H5S_SELECT_SET,
                              new long[] { start_idx }, null, count,
    null);
                      H5.H5Dwrite(did, tid, msid, fsid,
                              HDF5Constants.H5P_DEFAULT, strs);

                      if (block_indx == 10) {
                          t1 = System.currentTimeMillis();
                          System.out.println("Total time (minutes) = "
                                  + ((t1 - t0) * (DIMS[0] /
    BLOCK_SIZE)) / 1000
                                  / 600);
                      }

                      block_indx++;
                      start_idx = i;
                  }

              }

          } finally {
              try {
                  H5.H5Sclose(fsid);
              } catch (HDF5Exception ex) {
              }
              try {
                  H5.H5Sclose(msid);
              } catch (HDF5Exception ex) {
              }
              try {
                  H5.H5Dclose(did);
              } catch (HDF5Exception ex) {
              }
          }
      }

      private void createFile() throws Exception {
          int fid = -1;

          fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,

                  HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

          if (fid < 0)
              return;

          try {
              createDataset(fid);
              writeData(fid);
          } finally {
              H5.H5Fclose(fid);
          }
      }

      /**
       * @param args
       */
      public static void main(String[] args) {
          try {
              (new CreateStrings()).createFile();
          } catch (Exception ex) {
              ex.printStackTrace();
          }
      }

    }
    =========================

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Peter

Thanks for all the help so far, I've added code to add the last elements, if
you want to have it i can past it in a new email to you. One more question,
we need to compress the data I've now tried like this, within
createDataset(...)

H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
H5.H5Pset_chunk(plist, RANK, chunkSize);
H5.H5Pset_deflate(plist, 9);

I'm not sure what is the most efficient way, tried to exchange the
H5Pset_deflate(plist, 9) with

H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

but did not see any diffrens. I read the szip would maybe be better. If I
don't use deflate the hdf file is 1.5 gb with deflate it's 1.3 gb. So my
hopes is that it can be further decreased in size.

cheers, Håkon

···

On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Håkon,

Glad to know it work for you. Also you need to take care of the case that
the last block does not have the size of BLOCK_SIZE. This will happen
if the total size (25M) is not divided by BLOCK_SIZE. For better
performance,
make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

Thanks
--pc

Håkon Sagehaug wrote:

Hi Peter,

Thanks so much for the code, seems to work very well, the only thing I
found was that when the index for next index to write in the hdf array, I
had to add 1 to it, so instead of

   start_idx = i;

I now have

   start_idx = i + 1;

cheers, Håkon

On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   Hi Håkon,

   Below is the program that you can start with. I am using variable
   length strings.
   For fixed length strings, there are some extra work. You may have
   to make the
   strings to the same length.

   You may try different chunk sizes and block sizes to have the best
   performance.

   =======================
   import ncsa.hdf.hdf5lib.H5;
   import ncsa.hdf.hdf5lib.HDF5Constants;
   import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

   public class CreateStrings {

     private final static String H5_FILE = "G:\\temp\\strings.h5";
     private final static String DNAME = "/strs";
     private final static int RANK = 1;
     private final static long[] DIMS = { 25000000 };
     private final static long[] MAX_DIMS = {
   HDF5Constants.H5S_UNLIMITED };
     private final static long[] CHUNK_SIZE = { 25000 };
     private final static int BLOCK_SIZE = 250000;

     private void createDataset(int fid) throws Exception {
         int did = -1, tid = -1, sid = -1, plist = -1;

         try {

             tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
             // use variable length to save space
             H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
             sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

             // figure out creation properties

             plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
             H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
             H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

             did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
         } finally {
             try {
                 H5.H5Pclose(plist);
             } catch (HDF5Exception ex) {
             }
             try {
                 H5.H5Sclose(sid);
             } catch (HDF5Exception ex) {
             }
             try {
                 H5.H5Dclose(did);
             } catch (HDF5Exception ex) {
             }
         }
     }

     private void writeData(int fid) throws Exception {
         int did = -1, tid = -1, msid = -1, fsid = -1;
         long[] count = { BLOCK_SIZE };

         try {
             did = H5.H5Dopen(fid, DNAME);
             tid = H5.H5Dget_type(did);
             fsid = H5.H5Dget_space(did);
             msid = H5.H5Screate_simple(RANK, count, null);
             String[] strs = new String[BLOCK_SIZE];

             int idx = 0, block_indx = 0, start_idx = 0;
             long t0 = 0, t1 = 0;
             t0 = System.currentTimeMillis();
             System.out.println("Total number of blocks = "
                     + (DIMS[0] / BLOCK_SIZE));
             for (int i = 0; i < DIMS[0]; i++) {
                 strs[idx++] = "str" + i;
                 if (idx == BLOCK_SIZE) { // operator % is very expensive
                     idx = 0;
                     H5.H5Sselect_hyperslab(fsid,
   HDF5Constants.H5S_SELECT_SET,
                             new long[] { start_idx }, null, count,
   null);
                     H5.H5Dwrite(did, tid, msid, fsid,
                             HDF5Constants.H5P_DEFAULT, strs);

                     if (block_indx == 10) {
                         t1 = System.currentTimeMillis();
                         System.out.println("Total time (minutes) = "
                                 + ((t1 - t0) * (DIMS[0] /
   BLOCK_SIZE)) / 1000
                                 / 600);
                     }

                     block_indx++;
                     start_idx = i;
                 }

             }

         } finally {
             try {
                 H5.H5Sclose(fsid);
             } catch (HDF5Exception ex) {
             }
             try {
                 H5.H5Sclose(msid);
             } catch (HDF5Exception ex) {
             }
             try {
                 H5.H5Dclose(did);
             } catch (HDF5Exception ex) {
             }
         }
     }

     private void createFile() throws Exception {
         int fid = -1;

         fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,

                 HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

         if (fid < 0)
             return;

         try {
             createDataset(fid);
             writeData(fid);
         } finally {
             H5.H5Fclose(fid);
         }
     }

     /**
      * @param args
      */
     public static void main(String[] args) {
         try {
             (new CreateStrings()).createFile();
         } catch (Exception ex) {
             ex.printStackTrace();
         }
     }

   }
   =========================

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>

   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no, phone +47 55584125

Hi Håkon,

I don't need the code. As long as it works for you. I am happy.

Deflate level 6 is a good combination between file size and performance.
The compression ratio depends on the content. If every string is like a
random set of characters, the compression will not do much help. I will leave
it to you to try different compression options. If compression does not do
much help, it will be much better not using compression at all. It's your call.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter

Thanks for all the help so far, I've added code to add the last elements, if you want to have it i can past it in a new email to you. One more question, we need to compress the data I've now tried like this, within createDataset(...)

H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
H5.H5Pset_chunk(plist, RANK, chunkSize);
H5.H5Pset_deflate(plist, 9);

I'm not sure what is the most efficient way, tried to exchange the H5Pset_deflate(plist, 9) with

H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

but did not see any diffrens. I read the szip would maybe be better. If I don't use deflate the hdf file is 1.5 gb with deflate it's 1.3 gb. So my hopes is that it can be further decreased in size.

cheers, Håkon

On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Håkon,

    Glad to know it work for you. Also you need to take care of the
    case that
    the last block does not have the size of BLOCK_SIZE. This will happen
    if the total size (25M) is not divided by BLOCK_SIZE. For better
    performance,
    make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi Peter,

        Thanks so much for the code, seems to work very well, the only
        thing I found was that when the index for next index to write
        in the hdf array, I had to add 1 to it, so instead of

           start_idx = i;

        I now have

           start_idx = i + 1;

        cheers, Håkon

        On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           Hi Håkon,

           Below is the program that you can start with. I am using
        variable
           length strings.
           For fixed length strings, there are some extra work. You
        may have
           to make the
           strings to the same length.

           You may try different chunk sizes and block sizes to have
        the best
           performance.

           =======================
           import ncsa.hdf.hdf5lib.H5;
           import ncsa.hdf.hdf5lib.HDF5Constants;
           import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

           public class CreateStrings {

             private final static String H5_FILE = "G:\\temp\\strings.h5";
             private final static String DNAME = "/strs";
             private final static int RANK = 1;
             private final static long[] DIMS = { 25000000 };
             private final static long[] MAX_DIMS = {
           HDF5Constants.H5S_UNLIMITED };
             private final static long[] CHUNK_SIZE = { 25000 };
             private final static int BLOCK_SIZE = 250000;

             private void createDataset(int fid) throws Exception {
                 int did = -1, tid = -1, sid = -1, plist = -1;

                 try {

                     tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                     // use variable length to save space
                     H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
                     sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

                     // figure out creation properties

                     plist =
        H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                     H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                     H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

                     did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
                 } finally {
                     try {
                         H5.H5Pclose(plist);
                     } catch (HDF5Exception ex) {
                     }
                     try {
                         H5.H5Sclose(sid);
                     } catch (HDF5Exception ex) {
                     }
                     try {
                         H5.H5Dclose(did);
                     } catch (HDF5Exception ex) {
                     }
                 }
             }

             private void writeData(int fid) throws Exception {
                 int did = -1, tid = -1, msid = -1, fsid = -1;
                 long[] count = { BLOCK_SIZE };

                 try {
                     did = H5.H5Dopen(fid, DNAME);
                     tid = H5.H5Dget_type(did);
                     fsid = H5.H5Dget_space(did);
                     msid = H5.H5Screate_simple(RANK, count, null);
                     String[] strs = new String[BLOCK_SIZE];

                     int idx = 0, block_indx = 0, start_idx = 0;
                     long t0 = 0, t1 = 0;
                     t0 = System.currentTimeMillis();
                     System.out.println("Total number of blocks = "
                             + (DIMS[0] / BLOCK_SIZE));
                     for (int i = 0; i < DIMS[0]; i++) {
                         strs[idx++] = "str" + i;
                         if (idx == BLOCK_SIZE) { // operator % is
        very expensive
                             idx = 0;
                             H5.H5Sselect_hyperslab(fsid,
           HDF5Constants.H5S_SELECT_SET,
                                     new long[] { start_idx }, null,
        count,
           null);
                             H5.H5Dwrite(did, tid, msid, fsid,
                                     HDF5Constants.H5P_DEFAULT, strs);

                             if (block_indx == 10) {
                                 t1 = System.currentTimeMillis();
                                 System.out.println("Total time
        (minutes) = "
                                         + ((t1 - t0) * (DIMS[0] /
           BLOCK_SIZE)) / 1000
                                         / 600);
                             }

                             block_indx++;
                             start_idx = i;
                         }

                     }

                 } finally {
                     try {
                         H5.H5Sclose(fsid);
                     } catch (HDF5Exception ex) {
                     }
                     try {
                         H5.H5Sclose(msid);
                     } catch (HDF5Exception ex) {
                     }
                     try {
                         H5.H5Dclose(did);
                     } catch (HDF5Exception ex) {
                     }
                 }
             }

             private void createFile() throws Exception {
                 int fid = -1;

                 fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,

                         HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);

                 if (fid < 0)
                     return;

                 try {
                     createDataset(fid);
                     writeData(fid);
                 } finally {
                     H5.H5Fclose(fid);
                 }
             }

             /**
              * @param args
              */
             public static void main(String[] args) {
                 try {
                     (new CreateStrings()).createFile();
                 } catch (Exception ex) {
                     ex.printStackTrace();
                 }
             }

           }
           =========================

           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>, phone +47 55584125
------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi

Yes the content is more or less a random set of charcters. I'll try some
combinations and see what is the best. We need to transfer the file over a
network, so thats why we need to compress as much as possible. Will the
block size/chunk size have anything to say?

cheers, Håkon

···

On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Håkon,

I don't need the code. As long as it works for you. I am happy.

Deflate level 6 is a good combination between file size and performance.
The compression ratio depends on the content. If every string is like a
random set of characters, the compression will not do much help. I will
leave
it to you to try different compression options. If compression does not do
much help, it will be much better not using compression at all. It's your
call.

Thanks
--pc

Håkon Sagehaug wrote:

Hi Peter

Thanks for all the help so far, I've added code to add the last elements,
if you want to have it i can past it in a new email to you. One more
question, we need to compress the data I've now tried like this, within
createDataset(...)

H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
H5.H5Pset_chunk(plist, RANK, chunkSize);
H5.H5Pset_deflate(plist, 9);

I'm not sure what is the most efficient way, tried to exchange the
H5Pset_deflate(plist, 9) with

H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

but did not see any diffrens. I read the szip would maybe be better. If
I don't use deflate the hdf file is 1.5 gb with deflate it's 1.3 gb. So my
hopes is that it can be further decreased in size.

cheers, Håkon

On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   Hi Håkon,

   Glad to know it work for you. Also you need to take care of the
   case that
   the last block does not have the size of BLOCK_SIZE. This will happen
   if the total size (25M) is not divided by BLOCK_SIZE. For better
   performance,
   make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

   Thanks
   --pc

   Håkon Sagehaug wrote:

       Hi Peter,

       Thanks so much for the code, seems to work very well, the only
       thing I found was that when the index for next index to write
       in the hdf array, I had to add 1 to it, so instead of

          start_idx = i;

       I now have

          start_idx = i + 1;

       cheers, Håkon

       On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org >> >> <mailto:xcao@hdfgroup.org>>> wrote:

          Hi Håkon,

          Below is the program that you can start with. I am using
       variable
          length strings.
          For fixed length strings, there are some extra work. You
       may have
          to make the
          strings to the same length.

          You may try different chunk sizes and block sizes to have
       the best
          performance.

          =======================
          import ncsa.hdf.hdf5lib.H5;
          import ncsa.hdf.hdf5lib.HDF5Constants;
          import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

          public class CreateStrings {

            private final static String H5_FILE = "G:\\temp\\strings.h5";
            private final static String DNAME = "/strs";
            private final static int RANK = 1;
            private final static long[] DIMS = { 25000000 };
            private final static long[] MAX_DIMS = {
          HDF5Constants.H5S_UNLIMITED };
            private final static long[] CHUNK_SIZE = { 25000 };
            private final static int BLOCK_SIZE = 250000;

            private void createDataset(int fid) throws Exception {
                int did = -1, tid = -1, sid = -1, plist = -1;

                try {

                    tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                    // use variable length to save space
                    H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE);
                    sid = H5.H5Screate_simple(RANK, DIMS, MAX_DIMS);

                    // figure out creation properties

                    plist =
       H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                    H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                    H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

                    did = H5.H5Dcreate(fid, DNAME, tid, sid, plist);
                } finally {
                    try {
                        H5.H5Pclose(plist);
                    } catch (HDF5Exception ex) {
                    }
                    try {
                        H5.H5Sclose(sid);
                    } catch (HDF5Exception ex) {
                    }
                    try {
                        H5.H5Dclose(did);
                    } catch (HDF5Exception ex) {
                    }
                }
            }

            private void writeData(int fid) throws Exception {
                int did = -1, tid = -1, msid = -1, fsid = -1;
                long[] count = { BLOCK_SIZE };

                try {
                    did = H5.H5Dopen(fid, DNAME);
                    tid = H5.H5Dget_type(did);
                    fsid = H5.H5Dget_space(did);
                    msid = H5.H5Screate_simple(RANK, count, null);
                    String[] strs = new String[BLOCK_SIZE];

                    int idx = 0, block_indx = 0, start_idx = 0;
                    long t0 = 0, t1 = 0;
                    t0 = System.currentTimeMillis();
                    System.out.println("Total number of blocks = "
                            + (DIMS[0] / BLOCK_SIZE));
                    for (int i = 0; i < DIMS[0]; i++) {
                        strs[idx++] = "str" + i;
                        if (idx == BLOCK_SIZE) { // operator % is
       very expensive
                            idx = 0;
                            H5.H5Sselect_hyperslab(fsid,
          HDF5Constants.H5S_SELECT_SET,
                                    new long[] { start_idx }, null,
       count,
          null);
                            H5.H5Dwrite(did, tid, msid, fsid,
                                    HDF5Constants.H5P_DEFAULT, strs);

                            if (block_indx == 10) {
                                t1 = System.currentTimeMillis();
                                System.out.println("Total time
       (minutes) = "
                                        + ((t1 - t0) * (DIMS[0] /
          BLOCK_SIZE)) / 1000
                                        / 600);
                            }

                            block_indx++;
                            start_idx = i;
                        }

                    }

                } finally {
                    try {
                        H5.H5Sclose(fsid);
                    } catch (HDF5Exception ex) {
                    }
                    try {
                        H5.H5Sclose(msid);
                    } catch (HDF5Exception ex) {
                    }
                    try {
                        H5.H5Dclose(did);
                    } catch (HDF5Exception ex) {
                    }
                }
            }

            private void createFile() throws Exception {
                int fid = -1;

                fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,

                        HDF5Constants.H5P_DEFAULT,
       HDF5Constants.H5P_DEFAULT);

                if (fid < 0)
                    return;

                try {
                    createDataset(fid);
                    writeData(fid);
                } finally {
                    H5.H5Fclose(fid);
                }
            }

            /**
             * @param args
             */
            public static void main(String[] args) {
                try {
                    (new CreateStrings()).createFile();
                } catch (Exception ex) {
                    ex.printStackTrace();
                }
            }

          }
          =========================

          _______________________________________________
          Hdf-forum is for HDF software users discussion.
          Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>, phone +47 55584125

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

For compression, block size does not matter. Chunk size matters. Usually larger chunk size tends
to compress better. We usually use 64KB to 1MB for chunk size for better performance. Try
different chunk size, block size, and compression methods and level to have the best I/O performance
and compression ratio. As I mentioned earlier, if the content is random, the compression will not help much.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi

Yes the content is more or less a random set of charcters. I'll try some combinations and see what is the best. We need to transfer the file over a network, so thats why we need to compress as much as possible. Will the block size/chunk size have anything to say?

cheers, Håkon

On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Håkon,

    I don't need the code. As long as it works for you. I am happy.

    Deflate level 6 is a good combination between file size and
    performance.
    The compression ratio depends on the content. If every string is
    like a
    random set of characters, the compression will not do much help. I
    will leave
    it to you to try different compression options. If compression
    does not do
    much help, it will be much better not using compression at all.
    It's your call.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi Peter

        Thanks for all the help so far, I've added code to add the
        last elements, if you want to have it i can past it in a new
        email to you. One more question, we need to compress the data
        I've now tried like this, within createDataset(...)

        H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
        H5.H5Pset_chunk(plist, RANK, chunkSize);
        H5.H5Pset_deflate(plist, 9);

        I'm not sure what is the most efficient way, tried to exchange
        the H5Pset_deflate(plist, 9) with

        H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

        but did not see any diffrens. I read the szip would maybe be
        better. If I don't use deflate the hdf file is 1.5 gb with
        deflate it's 1.3 gb. So my hopes is that it can be further
        decreased in size.

        cheers, Håkon

        On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           Hi Håkon,

           Glad to know it work for you. Also you need to take care of the
           case that
           the last block does not have the size of BLOCK_SIZE. This
        will happen
           if the total size (25M) is not divided by BLOCK_SIZE. For
        better
           performance,
           make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

           Thanks
           --pc

           Håkon Sagehaug wrote:

               Hi Peter,

               Thanks so much for the code, seems to work very well,
        the only
               thing I found was that when the index for next index to
        write
               in the hdf array, I had to add 1 to it, so instead of

                  start_idx = i;

               I now have

                  start_idx = i + 1;

               cheers, Håkon

               On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>

               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
        wrote:

                  Hi Håkon,

                  Below is the program that you can start with. I am using
               variable
                  length strings.
                  For fixed length strings, there are some extra work. You
               may have
                  to make the
                  strings to the same length.

                  You may try different chunk sizes and block sizes to
        have
               the best
                  performance.

                  =======================
                  import ncsa.hdf.hdf5lib.H5;
                  import ncsa.hdf.hdf5lib.HDF5Constants;
                  import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                  public class CreateStrings {

                    private final static String H5_FILE =
        "G:\\temp\\strings.h5";
                    private final static String DNAME = "/strs";
                    private final static int RANK = 1;
                    private final static long[] DIMS = { 25000000 };
                    private final static long[] MAX_DIMS = {
                  HDF5Constants.H5S_UNLIMITED };
                    private final static long[] CHUNK_SIZE = { 25000 };
                    private final static int BLOCK_SIZE = 250000;

                    private void createDataset(int fid) throws Exception {
                        int did = -1, tid = -1, sid = -1, plist = -1;

                        try {

                            tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                            // use variable length to save space
                            H5.H5Tset_size(tid,
        HDF5Constants.H5T_VARIABLE);
                            sid = H5.H5Screate_simple(RANK, DIMS,
        MAX_DIMS);

                            // figure out creation properties

                            plist =
               H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                            H5.H5Pset_layout(plist,
        HDF5Constants.H5D_CHUNKED);
                            H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

                            did = H5.H5Dcreate(fid, DNAME, tid, sid,
        plist);
                        } finally {
                            try {
                                H5.H5Pclose(plist);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Sclose(sid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Dclose(did);
                            } catch (HDF5Exception ex) {
                            }
                        }
                    }

                    private void writeData(int fid) throws Exception {
                        int did = -1, tid = -1, msid = -1, fsid = -1;
                        long[] count = { BLOCK_SIZE };

                        try {
                            did = H5.H5Dopen(fid, DNAME);
                            tid = H5.H5Dget_type(did);
                            fsid = H5.H5Dget_space(did);
                            msid = H5.H5Screate_simple(RANK, count, null);
                            String[] strs = new String[BLOCK_SIZE];

                            int idx = 0, block_indx = 0, start_idx = 0;
                            long t0 = 0, t1 = 0;
                            t0 = System.currentTimeMillis();
                            System.out.println("Total number of blocks = "
                                    + (DIMS[0] / BLOCK_SIZE));
                            for (int i = 0; i < DIMS[0]; i++) {
                                strs[idx++] = "str" + i;
                                if (idx == BLOCK_SIZE) { // operator % is
               very expensive
                                    idx = 0;
                                    H5.H5Sselect_hyperslab(fsid,
                  HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx },
        null,
               count,
                  null);
                                    H5.H5Dwrite(did, tid, msid, fsid,
                                            HDF5Constants.H5P_DEFAULT,
        strs);

                                    if (block_indx == 10) {
                                        t1 = System.currentTimeMillis();
                                        System.out.println("Total time
               (minutes) = "
                                                + ((t1 - t0) * (DIMS[0] /
                  BLOCK_SIZE)) / 1000
                                                / 600);
                                    }

                                    block_indx++;
                                    start_idx = i;
                                }

                            }

                        } finally {
                            try {
                                H5.H5Sclose(fsid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Sclose(msid);
                            } catch (HDF5Exception ex) {
                            }
                            try {
                                H5.H5Dclose(did);
                            } catch (HDF5Exception ex) {
                            }
                        }
                    }

                    private void createFile() throws Exception {
                        int fid = -1;

                        fid = H5.H5Fcreate(H5_FILE,
        HDF5Constants.H5F_ACC_TRUNC,

                                HDF5Constants.H5P_DEFAULT,
               HDF5Constants.H5P_DEFAULT);

                        if (fid < 0)
                            return;

                        try {
                            createDataset(fid);
                            writeData(fid);
                        } finally {
                            H5.H5Fclose(fid);
                        }
                    }

                    /**
                     * @param args
                     */
                    public static void main(String[] args) {
                        try {
                            (new CreateStrings()).createFile();
                        } catch (Exception ex) {
                            ex.printStackTrace();
                        }
                    }

                  }
                  =========================

                  _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>

                                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                      ------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                      http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
               
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        -- Håkon Sagehaug, Scientific Programmer
        Parallab, Uni BCCS/Uni Research
        Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
        <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>,
        phone +47 55584125

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi again,

In my current application that you help with, I know how many lines I want
to write before hand, but I also wanted to get working the use case when I
don't know this before hand. So i tried to modify the program you more or
less wrote for me. My test file contains many lines each with one integer on
it like this,

31643
36594
59354
2481
64079
64181
491566836

In the test program below, I've set the initial size to be 2 for the
segments to write to the dataset. After the first segment is written I need
to extend the dataset set, select the portion to write using hyperslab and
the write it, but having some problems. I tried to follow the example here
[1], but have not succeeded. I pasted in the code below

public class HDFExtendLDData {
    private final static String H5_FILE = "/scratchtestHap/strings.h5";
    private final static String DNAME_SNP = "/snp.id.one";
    private final static int RANK = 1;
    private final static long[] MAX_DIMS = { HDF5Constants.H5S_UNLIMITED };

   /***
     * Creates a dataset for holding values of type integer, with a given
     * dimension, chucking and a group name.

···

*
     * @param fid
     * @param dims
     * @param chunkSize
     * @param groupName
     * @throws Exception
     */
    private void createIntegerDataset(int fid, long[] dims, long[]
chunkSize,
            String groupName) throws Exception {
        int did_snp = -1, type_int_id = -1, sid = -1, plist = -1, group_id =
-1;

        try {
            type_int_id = H5.H5Tcopy(HDF5Constants.H5T_STD_I32LE);

            sid = H5.H5Screate_simple(RANK, dims, MAX_DIMS);

            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);

            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);

            H5.H5Pset_chunk(plist, RANK, chunkSize);

            H5.H5Pset_deflate(plist, 6);

            group_id = H5.H5Gcreate(fid, groupName,
HDF5Constants.H5P_DEFAULT);

            did_snp = H5.H5Dcreate(group_id, groupName + DNAME_SNP,
                    type_int_id, sid, plist);

            System.out.println("created for chr " + groupName);

        } finally {
            try {
                H5.H5Pclose(plist);

            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Sclose(sid);

            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Dclose(did_snp);

                H5.H5Gclose(group_id);
            } catch (HDF5Exception ex) {
            }

        }
    }

    /***
     * Input a directory path, that contains some files. Need to extract the
     * data from the files and create one data set for each file within one
hdf
     * file.
     *
     * @param fid
     * Hdf File id
     * @param sourceFolder
     * Path to the folder
     * @throws Exception
     */
    private void writeDataFromFileToInt(int fid, String sourceFolder)
            throws Exception {
        int did_SNP = -1, msid = -1, fsid = -1, timesWritten = 0, group_id =
-1;

        Collection<File> fileCollection = FileUtils.listFiles(new File(
                sourceFolder), new String[] { "txt" }, false);

        int filesAdded = 0;

        /* Loop through a directory of files. */
        for (File sourceFile : fileCollection) {
            try {
                String choromosome_tmp = sourceFile.getName().split("_")[1];
                String choromosome = choromosome_tmp.substring(3,
                        choromosome_tmp.length());
                choromosome = "/" + choromosome + "/";

                /* Setting the initial size of the data set */
                long[] DIMS = { 2 };
                long[] CHUNK_SIZE = { 2 };
                int BLOCK_SIZE = 2;

                long[] count = { BLOCK_SIZE };

                /* Creates a new data set for the file to parse */
                createIntegerDataset(fid, DIMS, CHUNK_SIZE, choromosome);

                /* open the group that holds the data set */
                group_id = H5.H5Gopen(fid, choromosome);

                /* open the data set */
                did_SNP = H5.H5Dopen(group_id, choromosome + DNAME_SNP);

                /* fetches the data type, should be integer */
                int type_int_id = H5.H5Dget_type(did_SNP);

                fsid = H5.H5Dget_space(did_SNP);

                /* Memeory space */
                msid = H5.H5Screate_simple(RANK, count, null);

                /* Array for storing the values */
                int[] currentSNPIdArray = new int[BLOCK_SIZE];

                /* File to read teh values from */
                BigFile ldFile = new BigFile(sourceFile.getAbsolutePath());

                int idx = 0, block_indx = 0, start_idx = 0;
                System.out.println("Started to parse the file");

                int currentLine = 0;
                timesWritten = 0;

                /* Iterating over each line in the file */
                for (String ldLine : ldFile) {

                    currentSNPIdArray[idx] = Integer.valueOf(ldLine);

                    idx++;

                    if (idx == BLOCK_SIZE) {
                        idx = 0;
                        if (timesWritten == 0) {
                            /* Just write to the data set */
                            H5
                                    .H5Sselect_hyperslab(fsid,
                                            HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx }, null,
                                            count, null);
                            H5.H5Dwrite(did_SNP, type_int_id, msid, fsid,
                                    HDF5Constants.H5P_DEFAULT,
                                    currentSNPIdArray);
                        } else {
                            /* Need to extend the data set */
                            H5.H5Dextend(did_SNP, DIMS);
                            int extended_dataspace_id = H5
                                    .H5Dget_space(did_SNP);
                            H5.H5Sselect_all(extended_dataspace_id);
                            H5

.H5Sselect_hyperslab(extended_dataspace_id,
                                            HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx }, null,
                                            count, null);
                            H5.H5Dwrite(did_SNP, type_int_id, msid,
                                    extended_dataspace_id,
                                    HDF5Constants.H5P_DEFAULT,
                                    currentSNPIdArray);
                        }

                        block_indx++;
                        start_idx = currentLine + 1;
                        timesWritten++;

                    }

                    currentLine++;

                }
                filesAdded++;

                System.out.println("Finished parsing the file ");

            } finally {
                try {
                    H5.H5Gclose(group_id);
                    H5.H5Sclose(fsid);

                } catch (HDF5Exception ex) {
                }
                try {
                    H5.H5Sclose(msid);
                } catch (HDF5Exception ex) {
                }
                try {
                    H5.H5Dclose(did_SNP);
                } catch (HDF5Exception ex) {
                }
            }
        }

    }

    public void createFile(String sourceFile) throws Exception {
        int fid = -1;

        fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
                HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

        if (fid < 0)
            return;

        try {
            writeDataFromFileToInt(fid, sourceFile);
        } finally {
            H5.H5Fclose(fid);
        }
    }

When running the code I get this error

Exception in thread "main" ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException
    at ncsa.hdf.hdf5lib.H5.H5Dwrite_int(Native Method)
    at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1139)
    at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1181)
    at
no.uib.bccs.esysbio.sample.clients.HDFExtendLDData.writeDataFromFileToInt(HDFExtendLDData.java:145)

The line number in my code corresponds to where I'm writing to the dataset
after I've extended it.

Any tips on how to solve the issue?

cheers, Håkon

[1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

On 25 March 2010 16:13, Peter Cao <xcao@hdfgroup.org> wrote:

For compression, block size does not matter. Chunk size matters. Usually
larger chunk size tends
to compress better. We usually use 64KB to 1MB for chunk size for better
performance. Try
different chunk size, block size, and compression methods and level to have
the best I/O performance
and compression ratio. As I mentioned earlier, if the content is random,
the compression will not help much.

Thanks
--pc

Håkon Sagehaug wrote:

Hi

Yes the content is more or less a random set of charcters. I'll try some
combinations and see what is the best. We need to transfer the file over a
network, so thats why we need to compress as much as possible. Will the
block size/chunk size have anything to say?

cheers, Håkon

On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   Hi Håkon,

   I don't need the code. As long as it works for you. I am happy.

   Deflate level 6 is a good combination between file size and
   performance.
   The compression ratio depends on the content. If every string is
   like a
   random set of characters, the compression will not do much help. I
   will leave
   it to you to try different compression options. If compression
   does not do
   much help, it will be much better not using compression at all.
   It's your call.

   Thanks
   --pc

   Håkon Sagehaug wrote:

       Hi Peter

       Thanks for all the help so far, I've added code to add the
       last elements, if you want to have it i can past it in a new
       email to you. One more question, we need to compress the data
       I've now tried like this, within createDataset(...)

       H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
       H5.H5Pset_chunk(plist, RANK, chunkSize);
       H5.H5Pset_deflate(plist, 9);

       I'm not sure what is the most efficient way, tried to exchange
       the H5Pset_deflate(plist, 9) with

       H5.H5Pset_szip(plist, HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

       but did not see any diffrens. I read the szip would maybe be
       better. If I don't use deflate the hdf file is 1.5 gb with
       deflate it's 1.3 gb. So my hopes is that it can be further
       decreased in size.

       cheers, Håkon

       On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org>>> wrote:

          Hi Håkon,

          Glad to know it work for you. Also you need to take care of the
          case that
          the last block does not have the size of BLOCK_SIZE. This
       will happen
          if the total size (25M) is not divided by BLOCK_SIZE. For
       better
          performance,
          make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

          Thanks
          --pc

          Håkon Sagehaug wrote:

              Hi Peter,

              Thanks so much for the code, seems to work very well,
       the only
              thing I found was that when the index for next index to
       write
              in the hdf array, I had to add 1 to it, so instead of

                 start_idx = i;

              I now have

                 start_idx = i + 1;

              cheers, Håkon

              On 24 March 2010 01:19, Peter Cao <xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org>
              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
       <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>

              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>

       wrote:

                 Hi Håkon,

                 Below is the program that you can start with. I am using
              variable
                 length strings.
                 For fixed length strings, there are some extra work. You
              may have
                 to make the
                 strings to the same length.

                 You may try different chunk sizes and block sizes to
       have
              the best
                 performance.

                 =======================
                 import ncsa.hdf.hdf5lib.H5;
                 import ncsa.hdf.hdf5lib.HDF5Constants;
                 import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                 public class CreateStrings {

                   private final static String H5_FILE =
       "G:\\temp\\strings.h5";
                   private final static String DNAME = "/strs";
                   private final static int RANK = 1;
                   private final static long[] DIMS = { 25000000 };
                   private final static long[] MAX_DIMS = {
                 HDF5Constants.H5S_UNLIMITED };
                   private final static long[] CHUNK_SIZE = { 25000 };
                   private final static int BLOCK_SIZE = 250000;

                   private void createDataset(int fid) throws Exception {
                       int did = -1, tid = -1, sid = -1, plist = -1;

                       try {

                           tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                           // use variable length to save space
                           H5.H5Tset_size(tid,
       HDF5Constants.H5T_VARIABLE);
                           sid = H5.H5Screate_simple(RANK, DIMS,
       MAX_DIMS);

                           // figure out creation properties

                           plist =
              H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                           H5.H5Pset_layout(plist,
       HDF5Constants.H5D_CHUNKED);
                           H5.H5Pset_chunk(plist, RANK, CHUNK_SIZE);

                           did = H5.H5Dcreate(fid, DNAME, tid, sid,
       plist);
                       } finally {
                           try {
                               H5.H5Pclose(plist);
                           } catch (HDF5Exception ex) {
                           }
                           try {
                               H5.H5Sclose(sid);
                           } catch (HDF5Exception ex) {
                           }
                           try {
                               H5.H5Dclose(did);
                           } catch (HDF5Exception ex) {
                           }
                       }
                   }

                   private void writeData(int fid) throws Exception {
                       int did = -1, tid = -1, msid = -1, fsid = -1;
                       long[] count = { BLOCK_SIZE };

                       try {
                           did = H5.H5Dopen(fid, DNAME);
                           tid = H5.H5Dget_type(did);
                           fsid = H5.H5Dget_space(did);
                           msid = H5.H5Screate_simple(RANK, count, null);
                           String[] strs = new String[BLOCK_SIZE];

                           int idx = 0, block_indx = 0, start_idx = 0;
                           long t0 = 0, t1 = 0;
                           t0 = System.currentTimeMillis();
                           System.out.println("Total number of blocks = "
                                   + (DIMS[0] / BLOCK_SIZE));
                           for (int i = 0; i < DIMS[0]; i++) {
                               strs[idx++] = "str" + i;
                               if (idx == BLOCK_SIZE) { // operator % is
              very expensive
                                   idx = 0;
                                   H5.H5Sselect_hyperslab(fsid,
                 HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx },
       null,
              count,
                 null);
                                   H5.H5Dwrite(did, tid, msid, fsid,
                                           HDF5Constants.H5P_DEFAULT,
       strs);

                                   if (block_indx == 10) {
                                       t1 = System.currentTimeMillis();
                                       System.out.println("Total time
              (minutes) = "
                                               + ((t1 - t0) * (DIMS[0] /
                 BLOCK_SIZE)) / 1000
                                               / 600);
                                   }

                                   block_indx++;
                                   start_idx = i;
                               }

                           }

                       } finally {
                           try {
                               H5.H5Sclose(fsid);
                           } catch (HDF5Exception ex) {
                           }
                           try {
                               H5.H5Sclose(msid);
                           } catch (HDF5Exception ex) {
                           }
                           try {
                               H5.H5Dclose(did);
                           } catch (HDF5Exception ex) {
                           }
                       }
                   }

                   private void createFile() throws Exception {
                       int fid = -1;

                       fid = H5.H5Fcreate(H5_FILE,
       HDF5Constants.H5F_ACC_TRUNC,

                               HDF5Constants.H5P_DEFAULT,
              HDF5Constants.H5P_DEFAULT);

                       if (fid < 0)
                           return;

                       try {
                           createDataset(fid);
                           writeData(fid);
                       } finally {
                           H5.H5Fclose(fid);
                       }
                   }

                   /**
                    * @param args
                    */
                   public static void main(String[] args) {
                       try {
                           (new CreateStrings()).createFile();
                       } catch (Exception ex) {
                           ex.printStackTrace();
                       }
                   }

                 }
                 =========================

                 _______________________________________________
                 Hdf-forum is for HDF software users discussion.
                 Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

              _______________________________________________
              Hdf-forum is for HDF software users discussion.
              Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

          _______________________________________________
          Hdf-forum is for HDF software users discussion.
          Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

       -- Håkon Sagehaug, Scientific Programmer
       Parallab, Uni BCCS/Uni Research
       Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
       <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>,

       phone +47 55584125

------------------------------------------------------------------------

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

HDF5 does not compress variable length data well (basically you are trying to compress
addresses of pointers to variable length data). For best performance, you should not use
compression for variable length strings.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi again,

In my current application that you help with, I know how many lines I want to write before hand, but I also wanted to get working the use case when I don't know this before hand. So i tried to modify the program you more or less wrote for me. My test file contains many lines each with one integer on it like this,

31643
36594
59354
2481
64079
64181
491566836

In the test program below, I've set the initial size to be 2 for the segments to write to the dataset. After the first segment is written I need to extend the dataset set, select the portion to write using hyperslab and the write it, but having some problems. I tried to follow the example here [1], but have not succeeded. I pasted in the code below

public class HDFExtendLDData { private final static String H5_FILE = "/scratchtestHap/strings.h5";
    private final static String DNAME_SNP = "/snp.id.one";
    private final static int RANK = 1;
    private final static long[] MAX_DIMS = { HDF5Constants.H5S_UNLIMITED };
    /***
     * Creates a dataset for holding values of type integer, with a given
     * dimension, chucking and a group name.
     *
     * @param fid
     * @param dims
     * @param chunkSize
     * @param groupName
     * @throws Exception
     */
    private void createIntegerDataset(int fid, long[] dims, long[] chunkSize,
            String groupName) throws Exception {
        int did_snp = -1, type_int_id = -1, sid = -1, plist = -1, group_id = -1;

        try {
            type_int_id = H5.H5Tcopy(HDF5Constants.H5T_STD_I32LE);

            sid = H5.H5Screate_simple(RANK, dims, MAX_DIMS);

            plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);

            H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);

            H5.H5Pset_chunk(plist, RANK, chunkSize);

            H5.H5Pset_deflate(plist, 6);

            group_id = H5.H5Gcreate(fid, groupName, HDF5Constants.H5P_DEFAULT);

            did_snp = H5.H5Dcreate(group_id, groupName + DNAME_SNP,
                    type_int_id, sid, plist);

            System.out.println("created for chr " + groupName);

        } finally {
            try {
                H5.H5Pclose(plist);

            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Sclose(sid);

            } catch (HDF5Exception ex) {
            }
            try {
                H5.H5Dclose(did_snp);

                H5.H5Gclose(group_id);
            } catch (HDF5Exception ex) {
            }

        }
    }
        /***
     * Input a directory path, that contains some files. Need to extract the
     * data from the files and create one data set for each file within one hdf
     * file.
     *
     * @param fid
     * Hdf File id
     * @param sourceFolder
     * Path to the folder
     * @throws Exception
     */
    private void writeDataFromFileToInt(int fid, String sourceFolder)
            throws Exception {
        int did_SNP = -1, msid = -1, fsid = -1, timesWritten = 0, group_id = -1;

        Collection<File> fileCollection = FileUtils.listFiles(new File(
                sourceFolder), new String[] { "txt" }, false);

        int filesAdded = 0;

        /* Loop through a directory of files. */
        for (File sourceFile : fileCollection) {
            try {
                String choromosome_tmp = sourceFile.getName().split("_")[1];
                String choromosome = choromosome_tmp.substring(3,
                        choromosome_tmp.length());
                choromosome = "/" + choromosome + "/";

                /* Setting the initial size of the data set */
                long[] DIMS = { 2 };
                long[] CHUNK_SIZE = { 2 };
                int BLOCK_SIZE = 2;

                long[] count = { BLOCK_SIZE };

                /* Creates a new data set for the file to parse */
                createIntegerDataset(fid, DIMS, CHUNK_SIZE, choromosome);

                /* open the group that holds the data set */
                group_id = H5.H5Gopen(fid, choromosome);

                /* open the data set */
                did_SNP = H5.H5Dopen(group_id, choromosome + DNAME_SNP);

                /* fetches the data type, should be integer */
                int type_int_id = H5.H5Dget_type(did_SNP);

                fsid = H5.H5Dget_space(did_SNP);

                /* Memeory space */
                msid = H5.H5Screate_simple(RANK, count, null);

                /* Array for storing the values */
                int[] currentSNPIdArray = new int[BLOCK_SIZE];

                /* File to read teh values from */
                BigFile ldFile = new BigFile(sourceFile.getAbsolutePath());

                int idx = 0, block_indx = 0, start_idx = 0;
                System.out.println("Started to parse the file");

                int currentLine = 0;
                timesWritten = 0;

                /* Iterating over each line in the file */
                for (String ldLine : ldFile) {

                    currentSNPIdArray[idx] = Integer.valueOf(ldLine);

                    idx++;

                    if (idx == BLOCK_SIZE) {
                        idx = 0;
                        if (timesWritten == 0) {
                            /* Just write to the data set */
                            H5
                                    .H5Sselect_hyperslab(fsid,
                                            HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx }, null,
                                            count, null);
                            H5.H5Dwrite(did_SNP, type_int_id, msid, fsid,
                                    HDF5Constants.H5P_DEFAULT,
                                    currentSNPIdArray);
                        } else {
                            /* Need to extend the data set */
                            H5.H5Dextend(did_SNP, DIMS);
                            int extended_dataspace_id = H5
                                    .H5Dget_space(did_SNP);
                            H5.H5Sselect_all(extended_dataspace_id);
                            H5
                                    .H5Sselect_hyperslab(extended_dataspace_id,
                                            HDF5Constants.H5S_SELECT_SET,
                                            new long[] { start_idx }, null,
                                            count, null);
                            H5.H5Dwrite(did_SNP, type_int_id, msid,
                                    extended_dataspace_id,
                                    HDF5Constants.H5P_DEFAULT,
                                    currentSNPIdArray);
                        }

                        block_indx++;
                        start_idx = currentLine + 1;
                        timesWritten++;

                    }

                    currentLine++;

                }
                filesAdded++;

                System.out.println("Finished parsing the file ");

            } finally {
                try {
                    H5.H5Gclose(group_id);
                    H5.H5Sclose(fsid);

                } catch (HDF5Exception ex) {
                }
                try {
                    H5.H5Sclose(msid);
                } catch (HDF5Exception ex) {
                }
                try {
                    H5.H5Dclose(did_SNP);
                } catch (HDF5Exception ex) {
                }
            }
        }

    }

    public void createFile(String sourceFile) throws Exception {
        int fid = -1;

        fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
                HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

        if (fid < 0)
            return;

        try {
            writeDataFromFileToInt(fid, sourceFile);
        } finally {
            H5.H5Fclose(fid);
        }
    }

When running the code I get this error

Exception in thread "main" ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException
    at ncsa.hdf.hdf5lib.H5.H5Dwrite_int(Native Method)
    at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1139)
    at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1181)
    at no.uib.bccs.esysbio.sample.clients.HDFExtendLDData.writeDataFromFileToInt(HDFExtendLDData.java:145)

The line number in my code corresponds to where I'm writing to the dataset after I've extended it.

Any tips on how to solve the issue?

cheers, Håkon

[1] http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

On 25 March 2010 16:13, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    For compression, block size does not matter. Chunk size matters.
    Usually larger chunk size tends
    to compress better. We usually use 64KB to 1MB for chunk size for
    better performance. Try
    different chunk size, block size, and compression methods and
    level to have the best I/O performance
    and compression ratio. As I mentioned earlier, if the content is
    random, the compression will not help much.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi

        Yes the content is more or less a random set of charcters.
        I'll try some combinations and see what is the best. We need
        to transfer the file over a network, so thats why we need to
        compress as much as possible. Will the block size/chunk size
        have anything to say?

        cheers, Håkon

        On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           Hi Håkon,

           I don't need the code. As long as it works for you. I am happy.

           Deflate level 6 is a good combination between file size and
           performance.
           The compression ratio depends on the content. If every
        string is
           like a
           random set of characters, the compression will not do much
        help. I
           will leave
           it to you to try different compression options. If compression
           does not do
           much help, it will be much better not using compression at all.
           It's your call.

           Thanks
           --pc

           Håkon Sagehaug wrote:

               Hi Peter

               Thanks for all the help so far, I've added code to add the
               last elements, if you want to have it i can past it in
        a new
               email to you. One more question, we need to compress
        the data
               I've now tried like this, within createDataset(...)

               H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
               H5.H5Pset_chunk(plist, RANK, chunkSize);
               H5.H5Pset_deflate(plist, 9);

               I'm not sure what is the most efficient way, tried to
        exchange
               the H5Pset_deflate(plist, 9) with

               H5.H5Pset_szip(plist,
        HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

               but did not see any diffrens. I read the szip would
        maybe be
               better. If I don't use deflate the hdf file is 1.5 gb with
               deflate it's 1.3 gb. So my hopes is that it can be further
               decreased in size.

               cheers, Håkon

               On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
        wrote:

                  Hi Håkon,

                  Glad to know it work for you. Also you need to take
        care of the
                  case that
                  the last block does not have the size of BLOCK_SIZE.
        This
               will happen
                  if the total size (25M) is not divided by
        BLOCK_SIZE. For
               better
                  performance,
                  make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

                  Thanks
                  --pc

                  Håkon Sagehaug wrote:

                      Hi Peter,

                      Thanks so much for the code, seems to work very
        well,
               the only
                      thing I found was that when the index for next
        index to
               write
                      in the hdf array, I had to add 1 to it, so
        instead of

                         start_idx = i;

                      I now have

                         start_idx = i + 1;

                      cheers, Håkon

                      On 24 March 2010 01:19, Peter Cao
        <xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>

                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>>>>

               wrote:

                         Hi Håkon,

                         Below is the program that you can start with.
        I am using
                      variable
                         length strings.
                         For fixed length strings, there are some
        extra work. You
                      may have
                         to make the
                         strings to the same length.

                         You may try different chunk sizes and block
        sizes to
               have
                      the best
                         performance.

                         =======================
                         import ncsa.hdf.hdf5lib.H5;
                         import ncsa.hdf.hdf5lib.HDF5Constants;
                         import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                         public class CreateStrings {

                           private final static String H5_FILE =
               "G:\\temp\\strings.h5";
                           private final static String DNAME = "/strs";
                           private final static int RANK = 1;
                           private final static long[] DIMS = {
        25000000 };
                           private final static long[] MAX_DIMS = {
                         HDF5Constants.H5S_UNLIMITED };
                           private final static long[] CHUNK_SIZE = {
        25000 };
                           private final static int BLOCK_SIZE = 250000;

                           private void createDataset(int fid) throws
        Exception {
                               int did = -1, tid = -1, sid = -1, plist
        = -1;

                               try {

                                   tid =
        H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                                   // use variable length to save space
                                   H5.H5Tset_size(tid,
               HDF5Constants.H5T_VARIABLE);
                                   sid = H5.H5Screate_simple(RANK, DIMS,
               MAX_DIMS);

                                   // figure out creation properties

                                   plist =
                      H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                                   H5.H5Pset_layout(plist,
               HDF5Constants.H5D_CHUNKED);
                                   H5.H5Pset_chunk(plist, RANK,
        CHUNK_SIZE);

                                   did = H5.H5Dcreate(fid, DNAME, tid,
        sid,
               plist);
                               } finally {
                                   try {
                                       H5.H5Pclose(plist);
                                   } catch (HDF5Exception ex) {
                                   }
                                   try {
                                       H5.H5Sclose(sid);
                                   } catch (HDF5Exception ex) {
                                   }
                                   try {
                                       H5.H5Dclose(did);
                                   } catch (HDF5Exception ex) {
                                   }
                               }
                           }

                           private void writeData(int fid) throws
        Exception {
                               int did = -1, tid = -1, msid = -1, fsid
        = -1;
                               long[] count = { BLOCK_SIZE };

                               try {
                                   did = H5.H5Dopen(fid, DNAME);
                                   tid = H5.H5Dget_type(did);
                                   fsid = H5.H5Dget_space(did);
                                   msid = H5.H5Screate_simple(RANK,
        count, null);
                                   String[] strs = new String[BLOCK_SIZE];

                                   int idx = 0, block_indx = 0,
        start_idx = 0;
                                   long t0 = 0, t1 = 0;
                                   t0 = System.currentTimeMillis();
                                   System.out.println("Total number of
        blocks = "
                                           + (DIMS[0] / BLOCK_SIZE));
                                   for (int i = 0; i < DIMS[0]; i++) {
                                       strs[idx++] = "str" + i;
                                       if (idx == BLOCK_SIZE) { //
        operator % is
                      very expensive
                                           idx = 0;
                                           H5.H5Sselect_hyperslab(fsid,
                         HDF5Constants.H5S_SELECT_SET,
                                                   new long[] {
        start_idx },
               null,
                      count,
                         null);
                                           H5.H5Dwrite(did, tid, msid,
        fsid,
                                                          HDF5Constants.H5P_DEFAULT,
               strs);

                                           if (block_indx == 10) {
                                               t1 =
        System.currentTimeMillis();
                                                      System.out.println("Total time
                      (minutes) = "
                                                       + ((t1 - t0) *
        (DIMS[0] /
                         BLOCK_SIZE)) / 1000
                                                       / 600);
                                           }

                                           block_indx++;
                                           start_idx = i;
                                       }

                                   }

                               } finally {
                                   try {
                                       H5.H5Sclose(fsid);
                                   } catch (HDF5Exception ex) {
                                   }
                                   try {
                                       H5.H5Sclose(msid);
                                   } catch (HDF5Exception ex) {
                                   }
                                   try {
                                       H5.H5Dclose(did);
                                   } catch (HDF5Exception ex) {
                                   }
                               }
                           }

                           private void createFile() throws Exception {
                               int fid = -1;

                               fid = H5.H5Fcreate(H5_FILE,
               HDF5Constants.H5F_ACC_TRUNC,

                                       HDF5Constants.H5P_DEFAULT,
                      HDF5Constants.H5P_DEFAULT);

                               if (fid < 0)
                                   return;

                               try {
                                   createDataset(fid);
                                   writeData(fid);
                               } finally {
                                   H5.H5Fclose(fid);
                               }
                           }

                           /**
                            * @param args
                            */
                           public static void main(String[] args) {
                               try {
                                   (new CreateStrings()).createFile();
                               } catch (Exception ex) {
                                   ex.printStackTrace();
                               }
                           }

                         }
                         =========================

                         _______________________________________________
                         Hdf-forum is for HDF software users discussion.
                         Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>>

                                              http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                                    ------------------------------------------------------------------------

                      _______________________________________________
                      Hdf-forum is for HDF software users discussion.
                      Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                                    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
                                       _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

               -- Håkon Sagehaug, Scientific Programmer
               Parallab, Uni BCCS/Uni Research
               Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
        <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>,

               phone +47 55584125

                      ------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                      http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
               
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Peter,

My question now was more regarding the dynamically adding of data to a
dataset, when I don't know the number of entries in the table I wish to
store in the hdf file. The compression issue is no longer relevant, because
the values we needed to store from the lines in the file where all integers.
I'm struggling to get this part of the code working for me

...

/* Need to extend the data set */
                           H5.H5Dextend(did_SNP, DIMS);
                           int extended_dataspace_id = H5
                                   .H5Dget_space(did_SNP);
                           H5.H5Sselect_all(extended_
dataspace_id);
                           H5

.H5Sselect_hyperslab(extended_dataspace_id,
                                           HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx }, null,
                                           count, null);
                           H5.H5Dwrite(did_SNP, type_int_id, msid,
                                   extended_dataspace_id,
                                   HDF5Constants.H5P_DEFAULT,
                                   currentSNPIdArray);

cheers, Håkon

···

On 12 April 2010 18:46, Peter Cao <xcao@hdfgroup.org> wrote:

Hi Håkon,

HDF5 does not compress variable length data well (basically you are trying
to compress
addresses of pointers to variable length data). For best performance, you
should not use
compression for variable length strings.

Thanks
--pc

Håkon Sagehaug wrote:

Hi again,

In my current application that you help with, I know how many lines I want
to write before hand, but I also wanted to get working the use case when I
don't know this before hand. So i tried to modify the program you more or
less wrote for me. My test file contains many lines each with one integer on
it like this,

31643
36594
59354
2481
64079
64181
491566836

In the test program below, I've set the initial size to be 2 for the
segments to write to the dataset. After the first segment is written I need
to extend the dataset set, select the portion to write using hyperslab and
the write it, but having some problems. I tried to follow the example here
[1], but have not succeeded. I pasted in the code below

public class HDFExtendLDData { private final static String H5_FILE =
"/scratchtestHap/strings.h5";
   private final static String DNAME_SNP = "/snp.id.one";
   private final static int RANK = 1;
   private final static long[] MAX_DIMS = { HDF5Constants.H5S_UNLIMITED };
   /***
    * Creates a dataset for holding values of type integer, with a given
    * dimension, chucking and a group name.
    *
    * @param fid
    * @param dims
    * @param chunkSize
    * @param groupName
    * @throws Exception
    */
   private void createIntegerDataset(int fid, long[] dims, long[]
chunkSize,
           String groupName) throws Exception {
       int did_snp = -1, type_int_id = -1, sid = -1, plist = -1, group_id
= -1;

       try {
           type_int_id = H5.H5Tcopy(HDF5Constants.H5T_STD_I32LE);

           sid = H5.H5Screate_simple(RANK, dims, MAX_DIMS);

           plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);

           H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);

           H5.H5Pset_chunk(plist, RANK, chunkSize);

           H5.H5Pset_deflate(plist, 6);

           group_id = H5.H5Gcreate(fid, groupName,
HDF5Constants.H5P_DEFAULT);

           did_snp = H5.H5Dcreate(group_id, groupName + DNAME_SNP,
                   type_int_id, sid, plist);

           System.out.println("created for chr " + groupName);

       } finally {
           try {
               H5.H5Pclose(plist);

           } catch (HDF5Exception ex) {
           }
           try {
               H5.H5Sclose(sid);

           } catch (HDF5Exception ex) {
           }
           try {
               H5.H5Dclose(did_snp);

               H5.H5Gclose(group_id);
           } catch (HDF5Exception ex) {
           }

       }
   }
       /***
    * Input a directory path, that contains some files. Need to extract
the
    * data from the files and create one data set for each file within one
hdf
    * file.
    *
    * @param fid
    * Hdf File id
    * @param sourceFolder
    * Path to the folder
    * @throws Exception
    */
   private void writeDataFromFileToInt(int fid, String sourceFolder)
           throws Exception {
       int did_SNP = -1, msid = -1, fsid = -1, timesWritten = 0, group_id
= -1;

       Collection<File> fileCollection = FileUtils.listFiles(new File(
               sourceFolder), new String[] { "txt" }, false);

       int filesAdded = 0;

       /* Loop through a directory of files. */
       for (File sourceFile : fileCollection) {
           try {
               String choromosome_tmp =
sourceFile.getName().split("_")[1];
               String choromosome = choromosome_tmp.substring(3,
                       choromosome_tmp.length());
               choromosome = "/" + choromosome + "/";

               /* Setting the initial size of the data set */
               long[] DIMS = { 2 };
               long[] CHUNK_SIZE = { 2 };
               int BLOCK_SIZE = 2;

               long[] count = { BLOCK_SIZE };

               /* Creates a new data set for the file to parse */
               createIntegerDataset(fid, DIMS, CHUNK_SIZE, choromosome);

               /* open the group that holds the data set */
               group_id = H5.H5Gopen(fid, choromosome);

               /* open the data set */
               did_SNP = H5.H5Dopen(group_id, choromosome + DNAME_SNP);

               /* fetches the data type, should be integer */
               int type_int_id = H5.H5Dget_type(did_SNP);

               fsid = H5.H5Dget_space(did_SNP);

               /* Memeory space */
               msid = H5.H5Screate_simple(RANK, count, null);

               /* Array for storing the values */
               int[] currentSNPIdArray = new int[BLOCK_SIZE];

               /* File to read teh values from */
               BigFile ldFile = new BigFile(sourceFile.getAbsolutePath());

               int idx = 0, block_indx = 0, start_idx = 0;
               System.out.println("Started to parse the file");

               int currentLine = 0;
               timesWritten = 0;

               /* Iterating over each line in the file */
               for (String ldLine : ldFile) {

                   currentSNPIdArray[idx] = Integer.valueOf(ldLine);

                   idx++;

                   if (idx == BLOCK_SIZE) {
                       idx = 0;
                       if (timesWritten == 0) {
                           /* Just write to the data set */
                           H5
                                   .H5Sselect_hyperslab(fsid,
                                           HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx }, null,
                                           count, null);
                           H5.H5Dwrite(did_SNP, type_int_id, msid, fsid,
                                   HDF5Constants.H5P_DEFAULT,
                                   currentSNPIdArray);
                       } else {
                           /* Need to extend the data set */
                           H5.H5Dextend(did_SNP, DIMS);
                           int extended_dataspace_id = H5
                                   .H5Dget_space(did_SNP);
                           H5.H5Sselect_all(extended_dataspace_id);
                           H5

.H5Sselect_hyperslab(extended_dataspace_id,
                                           HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx }, null,
                                           count, null);
                           H5.H5Dwrite(did_SNP, type_int_id, msid,
                                   extended_dataspace_id,
                                   HDF5Constants.H5P_DEFAULT,
                                   currentSNPIdArray);
                       }

                       block_indx++;
                       start_idx = currentLine + 1;
                       timesWritten++;

                   }

                   currentLine++;

               }
               filesAdded++;

               System.out.println("Finished parsing the file ");

           } finally {
               try {
                   H5.H5Gclose(group_id);
                   H5.H5Sclose(fsid);

               } catch (HDF5Exception ex) {
               }
               try {
                   H5.H5Sclose(msid);
               } catch (HDF5Exception ex) {
               }
               try {
                   H5.H5Dclose(did_SNP);
               } catch (HDF5Exception ex) {
               }
           }
       }

   }

   public void createFile(String sourceFile) throws Exception {
       int fid = -1;

       fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
               HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);

       if (fid < 0)
           return;

       try {
           writeDataFromFileToInt(fid, sourceFile);
       } finally {
           H5.H5Fclose(fid);
       }
   }

When running the code I get this error

Exception in thread "main"
ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException
   at ncsa.hdf.hdf5lib.H5.H5Dwrite_int(Native Method)
   at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1139)
   at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1181)
   at
no.uib.bccs.esysbio.sample.clients.HDFExtendLDData.writeDataFromFileToInt(HDFExtendLDData.java:145)

The line number in my code corresponds to where I'm writing to the dataset
after I've extended it.

Any tips on how to solve the issue?

cheers, Håkon

[1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

On 25 March 2010 16:13, Peter Cao <xcao@hdfgroup.org <mailto: >> xcao@hdfgroup.org>> wrote:

   For compression, block size does not matter. Chunk size matters.
   Usually larger chunk size tends
   to compress better. We usually use 64KB to 1MB for chunk size for
   better performance. Try
   different chunk size, block size, and compression methods and
   level to have the best I/O performance
   and compression ratio. As I mentioned earlier, if the content is
   random, the compression will not help much.

   Thanks
   --pc

   Håkon Sagehaug wrote:

       Hi

       Yes the content is more or less a random set of charcters.
       I'll try some combinations and see what is the best. We need
       to transfer the file over a network, so thats why we need to
       compress as much as possible. Will the block size/chunk size
       have anything to say?

       cheers, Håkon

       On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org >> <mailto:xcao@hdfgroup.org>>> wrote:

          Hi Håkon,

          I don't need the code. As long as it works for you. I am happy.

          Deflate level 6 is a good combination between file size and
          performance.
          The compression ratio depends on the content. If every
       string is
          like a
          random set of characters, the compression will not do much
       help. I
          will leave
          it to you to try different compression options. If compression
          does not do
          much help, it will be much better not using compression at all.
          It's your call.

          Thanks
          --pc

          Håkon Sagehaug wrote:

              Hi Peter

              Thanks for all the help so far, I've added code to add the
              last elements, if you want to have it i can past it in
       a new
              email to you. One more question, we need to compress
       the data
              I've now tried like this, within createDataset(...)

              H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
              H5.H5Pset_chunk(plist, RANK, chunkSize);
              H5.H5Pset_deflate(plist, 9);

              I'm not sure what is the most efficient way, tried to
       exchange
              the H5Pset_deflate(plist, 9) with

              H5.H5Pset_szip(plist,
       HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

              but did not see any diffrens. I read the szip would
       maybe be
              better. If I don't use deflate the hdf file is 1.5 gb with
              deflate it's 1.3 gb. So my hopes is that it can be further
              decreased in size.

              cheers, Håkon

              On 24 March 2010 17:25, Peter Cao <xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org>
              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
       <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
       wrote:

                 Hi Håkon,

                 Glad to know it work for you. Also you need to take
       care of the
                 case that
                 the last block does not have the size of BLOCK_SIZE.
       This
              will happen
                 if the total size (25M) is not divided by
       BLOCK_SIZE. For
              better
                 performance,
                 make sure that BLOCK_SIZE is divided by CHUNK_SIZE.

                 Thanks
                 --pc

                 Håkon Sagehaug wrote:

                     Hi Peter,

                     Thanks so much for the code, seems to work very
       well,
              the only
                     thing I found was that when the index for next
       index to
              write
                     in the hdf array, I had to add 1 to it, so
       instead of

                        start_idx = i;

                     I now have

                        start_idx = i + 1;

                     cheers, Håkon

                     On 24 March 2010 01:19, Peter Cao
       <xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
                     <mailto:xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org>>>
              <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
       <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>

                     <mailto:xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
       <mailto:xcao@hdfgroup.org>>>>>

              wrote:

                        Hi Håkon,

                        Below is the program that you can start with.
       I am using
                     variable
                        length strings.
                        For fixed length strings, there are some
       extra work. You
                     may have
                        to make the
                        strings to the same length.

                        You may try different chunk sizes and block
       sizes to
              have
                     the best
                        performance.

                        =======================
                        import ncsa.hdf.hdf5lib.H5;
                        import ncsa.hdf.hdf5lib.HDF5Constants;
                        import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                        public class CreateStrings {

                          private final static String H5_FILE =
              "G:\\temp\\strings.h5";
                          private final static String DNAME = "/strs";
                          private final static int RANK = 1;
                          private final static long[] DIMS = {
       25000000 };
                          private final static long[] MAX_DIMS = {
                        HDF5Constants.H5S_UNLIMITED };
                          private final static long[] CHUNK_SIZE = {
       25000 };
                          private final static int BLOCK_SIZE = 250000;

                          private void createDataset(int fid) throws
       Exception {
                              int did = -1, tid = -1, sid = -1, plist
       = -1;

                              try {

                                  tid =
       H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                                  // use variable length to save space
                                  H5.H5Tset_size(tid,
              HDF5Constants.H5T_VARIABLE);
                                  sid = H5.H5Screate_simple(RANK, DIMS,
              MAX_DIMS);

                                  // figure out creation properties

                                  plist =
                     H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                                  H5.H5Pset_layout(plist,
              HDF5Constants.H5D_CHUNKED);
                                  H5.H5Pset_chunk(plist, RANK,
       CHUNK_SIZE);

                                  did = H5.H5Dcreate(fid, DNAME, tid,
       sid,
              plist);
                              } finally {
                                  try {
                                      H5.H5Pclose(plist);
                                  } catch (HDF5Exception ex) {
                                  }
                                  try {
                                      H5.H5Sclose(sid);
                                  } catch (HDF5Exception ex) {
                                  }
                                  try {
                                      H5.H5Dclose(did);
                                  } catch (HDF5Exception ex) {
                                  }
                              }
                          }

                          private void writeData(int fid) throws
       Exception {
                              int did = -1, tid = -1, msid = -1, fsid
       = -1;
                              long[] count = { BLOCK_SIZE };

                              try {
                                  did = H5.H5Dopen(fid, DNAME);
                                  tid = H5.H5Dget_type(did);
                                  fsid = H5.H5Dget_space(did);
                                  msid = H5.H5Screate_simple(RANK,
       count, null);
                                  String[] strs = new String[BLOCK_SIZE];

                                  int idx = 0, block_indx = 0,
       start_idx = 0;
                                  long t0 = 0, t1 = 0;
                                  t0 = System.currentTimeMillis();
                                  System.out.println("Total number of
       blocks = "
                                          + (DIMS[0] / BLOCK_SIZE));
                                  for (int i = 0; i < DIMS[0]; i++) {
                                      strs[idx++] = "str" + i;
                                      if (idx == BLOCK_SIZE) { //
       operator % is
                     very expensive
                                          idx = 0;
                                          H5.H5Sselect_hyperslab(fsid,
                        HDF5Constants.H5S_SELECT_SET,
                                                  new long[] {
       start_idx },
              null,
                     count,
                        null);
                                          H5.H5Dwrite(did, tid, msid,
       fsid,

HDF5Constants.H5P_DEFAULT,
              strs);

                                          if (block_indx == 10) {
                                              t1 =
       System.currentTimeMillis();

System.out.println("Total time
                     (minutes) = "
                                                      + ((t1 - t0) *
       (DIMS[0] /
                        BLOCK_SIZE)) / 1000
                                                      / 600);
                                          }

                                          block_indx++;
                                          start_idx = i;
                                      }

                                  }

                              } finally {
                                  try {
                                      H5.H5Sclose(fsid);
                                  } catch (HDF5Exception ex) {
                                  }
                                  try {
                                      H5.H5Sclose(msid);
                                  } catch (HDF5Exception ex) {
                                  }
                                  try {
                                      H5.H5Dclose(did);
                                  } catch (HDF5Exception ex) {
                                  }
                              }
                          }

                          private void createFile() throws Exception {
                              int fid = -1;

                              fid = H5.H5Fcreate(H5_FILE,
              HDF5Constants.H5F_ACC_TRUNC,

                                      HDF5Constants.H5P_DEFAULT,
                     HDF5Constants.H5P_DEFAULT);

                              if (fid < 0)
                                  return;

                              try {
                                  createDataset(fid);
                                  writeData(fid);
                              } finally {
                                  H5.H5Fclose(fid);
                              }
                          }

                          /**
                           * @param args
                           */
                          public static void main(String[] args) {
                              try {
                                  (new CreateStrings()).createFile();
                              } catch (Exception ex) {
                                  ex.printStackTrace();
                              }
                          }

                        }
                        =========================

                        _______________________________________________
                        Hdf-forum is for HDF software users discussion.
                        Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>>
                     <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

                     _______________________________________________
                     Hdf-forum is for HDF software users discussion.
                     Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
                 Hdf-forum is for HDF software users discussion.
                 Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>
              <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
       <mailto:Hdf-forum@hdfgroup.org>>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

              -- Håkon Sagehaug, Scientific Programmer
              Parallab, Uni BCCS/Uni Research
              Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>
       <mailto:Hakon.Sagehaug@uni.no <mailto:Hakon.Sagehaug@uni.no>>
              <mailto:Hakon.Sagehaug@uni.no
       <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
       <mailto:Hakon.Sagehaug@uni.no>>>,

              phone +47 55584125

------------------------------------------------------------------------

              _______________________________________________
              Hdf-forum is for HDF software users discussion.
              Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

          _______________________________________________
          Hdf-forum is for HDF software users discussion.
          Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

       _______________________________________________
       Hdf-forum is for HDF software users discussion.
       Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

   _______________________________________________
   Hdf-forum is for HDF software users discussion.
   Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
   http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Håkon,

Make sure the DIMS is the total dimension size at H5.H5Dextend(did_SNP, DIMS),
i.e. the new dims size = old size + increase.

Thanks
--pc

Håkon Sagehaug wrote:

···

Hi Peter,

My question now was more regarding the dynamically adding of data to a dataset, when I don't know the number of entries in the table I wish to store in the hdf file. The compression issue is no longer relevant, because the values we needed to store from the lines in the file where all integers. I'm struggling to get this part of the code working for me

...

/* Need to extend the data set */
                           H5.H5Dextend(did_SNP, DIMS);
                           int extended_dataspace_id = H5
                                   .H5Dget_space(did_SNP);
                           H5.H5Sselect_all(extended_
dataspace_id);
                           H5
                                   .H5Sselect_hyperslab(extended_dataspace_id,
                                           HDF5Constants.H5S_SELECT_SET,
                                           new long[] { start_idx }, null,
                                           count, null);
                           H5.H5Dwrite(did_SNP, type_int_id, msid,
                                   extended_dataspace_id,
                                   HDF5Constants.H5P_DEFAULT,
                                   currentSNPIdArray);

cheers, Håkon

On 12 April 2010 18:46, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>> wrote:

    Hi Håkon,

    HDF5 does not compress variable length data well (basically you
    are trying to compress
    addresses of pointers to variable length data). For best
    performance, you should not use
    compression for variable length strings.

    Thanks
    --pc

    Håkon Sagehaug wrote:

        Hi again,

        In my current application that you help with, I know how many
        lines I want to write before hand, but I also wanted to get
        working the use case when I don't know this before hand. So i
        tried to modify the program you more or less wrote for me. My
        test file contains many lines each with one integer on it like
        this,

        31643
        36594
        59354
        2481
        64079
        64181
        491566836

        In the test program below, I've set the initial size to be 2
        for the segments to write to the dataset. After the first
        segment is written I need to extend the dataset set, select
        the portion to write using hyperslab and the write it, but
        having some problems. I tried to follow the example here [1],
        but have not succeeded. I pasted in the code below

        public class HDFExtendLDData { private final static
        String H5_FILE = "/scratchtestHap/strings.h5";
           private final static String DNAME_SNP = "/snp.id.one";
           private final static int RANK = 1;
           private final static long[] MAX_DIMS = {
        HDF5Constants.H5S_UNLIMITED };
           /***
            * Creates a dataset for holding values of type integer,
        with a given
            * dimension, chucking and a group name.
            *
            * @param fid
            * @param dims
            * @param chunkSize
            * @param groupName
            * @throws Exception
            */
           private void createIntegerDataset(int fid, long[] dims,
        long[] chunkSize,
                   String groupName) throws Exception {
               int did_snp = -1, type_int_id = -1, sid = -1, plist =
        -1, group_id = -1;

               try {
                   type_int_id = H5.H5Tcopy(HDF5Constants.H5T_STD_I32LE);

                   sid = H5.H5Screate_simple(RANK, dims, MAX_DIMS);

                   plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);

                   H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);

                   H5.H5Pset_chunk(plist, RANK, chunkSize);

                   H5.H5Pset_deflate(plist, 6);

                   group_id = H5.H5Gcreate(fid, groupName,
        HDF5Constants.H5P_DEFAULT);

                   did_snp = H5.H5Dcreate(group_id, groupName + DNAME_SNP,
                           type_int_id, sid, plist);

                   System.out.println("created for chr " + groupName);

               } finally {
                   try {
                       H5.H5Pclose(plist);

                   } catch (HDF5Exception ex) {
                   }
                   try {
                       H5.H5Sclose(sid);

                   } catch (HDF5Exception ex) {
                   }
                   try {
                       H5.H5Dclose(did_snp);

                       H5.H5Gclose(group_id);
                   } catch (HDF5Exception ex) {
                   }

               }
           }
               /***
            * Input a directory path, that contains some files. Need
        to extract the
            * data from the files and create one data set for each
        file within one hdf
            * file.
            *
            * @param fid
            * Hdf File id
            * @param sourceFolder
            * Path to the folder
            * @throws Exception
            */
           private void writeDataFromFileToInt(int fid, String
        sourceFolder)
                   throws Exception {
               int did_SNP = -1, msid = -1, fsid = -1, timesWritten =
        0, group_id = -1;

               Collection<File> fileCollection =
        FileUtils.listFiles(new File(
                       sourceFolder), new String[] { "txt" }, false);

               int filesAdded = 0;

               /* Loop through a directory of files. */
               for (File sourceFile : fileCollection) {
                   try {
                       String choromosome_tmp =
        sourceFile.getName().split("_")[1];
                       String choromosome = choromosome_tmp.substring(3,
                               choromosome_tmp.length());
                       choromosome = "/" + choromosome + "/";

                       /* Setting the initial size of the data set */
                       long[] DIMS = { 2 };
                       long[] CHUNK_SIZE = { 2 };
                       int BLOCK_SIZE = 2;

                       long[] count = { BLOCK_SIZE };

                       /* Creates a new data set for the file to parse */
                       createIntegerDataset(fid, DIMS, CHUNK_SIZE,
        choromosome);

                       /* open the group that holds the data set */
                       group_id = H5.H5Gopen(fid, choromosome);

                       /* open the data set */
                       did_SNP = H5.H5Dopen(group_id, choromosome +
        DNAME_SNP);

                       /* fetches the data type, should be integer */
                       int type_int_id = H5.H5Dget_type(did_SNP);

                       fsid = H5.H5Dget_space(did_SNP);

                       /* Memeory space */
                       msid = H5.H5Screate_simple(RANK, count, null);

                       /* Array for storing the values */
                       int[] currentSNPIdArray = new int[BLOCK_SIZE];

                       /* File to read teh values from */
                       BigFile ldFile = new
        BigFile(sourceFile.getAbsolutePath());

                       int idx = 0, block_indx = 0, start_idx = 0;
                       System.out.println("Started to parse the file");

                       int currentLine = 0;
                       timesWritten = 0;

                       /* Iterating over each line in the file */
                       for (String ldLine : ldFile) {

                           currentSNPIdArray[idx] =
        Integer.valueOf(ldLine);

                           idx++;

                           if (idx == BLOCK_SIZE) {
                               idx = 0;
                               if (timesWritten == 0) {
                                   /* Just write to the data set */
                                   H5
                                           .H5Sselect_hyperslab(fsid,
                                                          HDF5Constants.H5S_SELECT_SET,
                                                   new long[] {
        start_idx }, null,
                                                   count, null);
                                   H5.H5Dwrite(did_SNP, type_int_id,
        msid, fsid,
                                           HDF5Constants.H5P_DEFAULT,
                                           currentSNPIdArray);
                               } else {
                                   /* Need to extend the data set */
                                   H5.H5Dextend(did_SNP, DIMS);
                                   int extended_dataspace_id = H5
                                           .H5Dget_space(did_SNP);
                                          H5.H5Sselect_all(extended_dataspace_id);
                                   H5
                                                  .H5Sselect_hyperslab(extended_dataspace_id,
                                                          HDF5Constants.H5S_SELECT_SET,
                                                   new long[] {
        start_idx }, null,
                                                   count, null);
                                   H5.H5Dwrite(did_SNP, type_int_id, msid,
                                           extended_dataspace_id,
                                           HDF5Constants.H5P_DEFAULT,
                                           currentSNPIdArray);
                               }

                               block_indx++;
                               start_idx = currentLine + 1;
                               timesWritten++;

                           }

                           currentLine++;

                       }
                       filesAdded++;

                       System.out.println("Finished parsing the file ");

                   } finally {
                       try {
                           H5.H5Gclose(group_id);
                           H5.H5Sclose(fsid);

                       } catch (HDF5Exception ex) {
                       }
                       try {
                           H5.H5Sclose(msid);
                       } catch (HDF5Exception ex) {
                       }
                       try {
                           H5.H5Dclose(did_SNP);
                       } catch (HDF5Exception ex) {
                       }
                   }
               }

           }

           public void createFile(String sourceFile) throws Exception {
               int fid = -1;

               fid = H5.H5Fcreate(H5_FILE, HDF5Constants.H5F_ACC_TRUNC,
                       HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);

               if (fid < 0)
                   return;

               try {
                   writeDataFromFileToInt(fid, sourceFile);
               } finally {
                   H5.H5Fclose(fid);
               }
           }

        When running the code I get this error

        Exception in thread "main"
        ncsa.hdf.hdf5lib.exceptions.HDF5LibraryException
           at ncsa.hdf.hdf5lib.H5.H5Dwrite_int(Native Method)
           at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1139)
           at ncsa.hdf.hdf5lib.H5.H5Dwrite(H5.java:1181)
           at
        no.uib.bccs.esysbio.sample.clients.HDFExtendLDData.writeDataFromFileToInt(HDFExtendLDData.java:145)

        The line number in my code corresponds to where I'm writing to
        the dataset after I've extended it.

        Any tips on how to solve the issue?

        cheers, Håkon

        [1]
        http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java

        On 25 March 2010 16:13, Peter Cao <xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org > <mailto:xcao@hdfgroup.org>>> wrote:

           For compression, block size does not matter. Chunk size
        matters.
           Usually larger chunk size tends
           to compress better. We usually use 64KB to 1MB for chunk
        size for
           better performance. Try
           different chunk size, block size, and compression methods and
           level to have the best I/O performance
           and compression ratio. As I mentioned earlier, if the
        content is
           random, the compression will not help much.

           Thanks
           --pc

           Håkon Sagehaug wrote:

               Hi

               Yes the content is more or less a random set of charcters.
               I'll try some combinations and see what is the best. We
        need
               to transfer the file over a network, so thats why we
        need to
               compress as much as possible. Will the block size/chunk
        size
               have anything to say?

               cheers, Håkon

               On 25 March 2010 15:48, Peter Cao <xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
        wrote:

                  Hi Håkon,

                  I don't need the code. As long as it works for you.
        I am happy.

                  Deflate level 6 is a good combination between file
        size and
                  performance.
                  The compression ratio depends on the content. If every
               string is
                  like a
                  random set of characters, the compression will not
        do much
               help. I
                  will leave
                  it to you to try different compression options. If
        compression
                  does not do
                  much help, it will be much better not using
        compression at all.
                  It's your call.

                  Thanks
                  --pc

                  Håkon Sagehaug wrote:

                      Hi Peter

                      Thanks for all the help so far, I've added code
        to add the
                      last elements, if you want to have it i can past
        it in
               a new
                      email to you. One more question, we need to compress
               the data
                      I've now tried like this, within createDataset(...)

                      H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
                      H5.H5Pset_chunk(plist, RANK, chunkSize);
                      H5.H5Pset_deflate(plist, 9);

                      I'm not sure what is the most efficient way,
        tried to
               exchange
                      the H5Pset_deflate(plist, 9) with

                      H5.H5Pset_szip(plist,
               HDF5Constants.H5_SZIP_NN_OPTION_MASK, 8);

                      but did not see any diffrens. I read the szip would
               maybe be
                      better. If I don't use deflate the hdf file is
        1.5 gb with
                      deflate it's 1.3 gb. So my hopes is that it can
        be further
                      decreased in size.

                      cheers, Håkon

                      On 24 March 2010 17:25, Peter Cao
        <xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>>>>
               wrote:

                         Hi Håkon,

                         Glad to know it work for you. Also you need
        to take
               care of the
                         case that
                         the last block does not have the size of
        BLOCK_SIZE.
               This
                      will happen
                         if the total size (25M) is not divided by
               BLOCK_SIZE. For
                      better
                         performance,
                         make sure that BLOCK_SIZE is divided by
        CHUNK_SIZE.

                         Thanks
                         --pc

                         Håkon Sagehaug wrote:

                             Hi Peter,

                             Thanks so much for the code, seems to
        work very
               well,
                      the only
                             thing I found was that when the index for
        next
               index to
                      write
                             in the hdf array, I had to add 1 to it, so
               instead of

                                start_idx = i;

                             I now have

                                start_idx = i + 1;

                             cheers, Håkon

                             On 24 March 2010 01:19, Peter Cao
               <xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>>
                             <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>
                      <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org> <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>

                             <mailto:xcao@hdfgroup.org
        <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>
        <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>
               <mailto:xcao@hdfgroup.org <mailto:xcao@hdfgroup.org>>>>>>

                      wrote:

                                Hi Håkon,

                                Below is the program that you can
        start with.
               I am using
                             variable
                                length strings.
                                For fixed length strings, there are some
               extra work. You
                             may have
                                to make the
                                strings to the same length.

                                You may try different chunk sizes and
        block
               sizes to
                      have
                             the best
                                performance.

                                =======================
                                import ncsa.hdf.hdf5lib.H5;
                                import ncsa.hdf.hdf5lib.HDF5Constants;
                                import
        ncsa.hdf.hdf5lib.exceptions.HDF5Exception;

                                public class CreateStrings {

                                  private final static String H5_FILE =
                      "G:\\temp\\strings.h5";
                                  private final static String DNAME =
        "/strs";
                                  private final static int RANK = 1;
                                  private final static long[] DIMS = {
               25000000 };
                                  private final static long[] MAX_DIMS = {
                                HDF5Constants.H5S_UNLIMITED };
                                  private final static long[]
        CHUNK_SIZE = {
               25000 };
                                  private final static int BLOCK_SIZE
        = 250000;

                                  private void createDataset(int fid)
        throws
               Exception {
                                      int did = -1, tid = -1, sid =
        -1, plist
               = -1;

                                      try {

                                          tid =
               H5.H5Tcopy(HDF5Constants.H5T_C_S1);
                                          // use variable length to
        save space
                                          H5.H5Tset_size(tid,
                      HDF5Constants.H5T_VARIABLE);
                                          sid =
        H5.H5Screate_simple(RANK, DIMS,
                      MAX_DIMS);

                                          // figure out creation
        properties

                                          plist =
                                    H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
                                          H5.H5Pset_layout(plist,
                      HDF5Constants.H5D_CHUNKED);
                                          H5.H5Pset_chunk(plist, RANK,
               CHUNK_SIZE);

                                          did = H5.H5Dcreate(fid,
        DNAME, tid,
               sid,
                      plist);
                                      } finally {
                                          try {
                                              H5.H5Pclose(plist);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Sclose(sid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Dclose(did);
                                          } catch (HDF5Exception ex) {
                                          }
                                      }
                                  }

                                  private void writeData(int fid) throws
               Exception {
                                      int did = -1, tid = -1, msid =
        -1, fsid
               = -1;
                                      long[] count = { BLOCK_SIZE };

                                      try {
                                          did = H5.H5Dopen(fid, DNAME);
                                          tid = H5.H5Dget_type(did);
                                          fsid = H5.H5Dget_space(did);
                                          msid = H5.H5Screate_simple(RANK,
               count, null);
                                          String[] strs = new
        String[BLOCK_SIZE];

                                          int idx = 0, block_indx = 0,
               start_idx = 0;
                                          long t0 = 0, t1 = 0;
                                          t0 = System.currentTimeMillis();
                                          System.out.println("Total
        number of
               blocks = "
                                                  + (DIMS[0] /
        BLOCK_SIZE));
                                          for (int i = 0; i < DIMS[0];
        i++) {
                                              strs[idx++] = "str" + i;
                                              if (idx == BLOCK_SIZE) { //
               operator % is
                             very expensive
                                                  idx = 0;
                                                         H5.H5Sselect_hyperslab(fsid,
                                HDF5Constants.H5S_SELECT_SET,
                                                          new long[] {
               start_idx },
                      null,
                             count,
                                null);
                                                  H5.H5Dwrite(did,
        tid, msid,
               fsid,
                                                                        HDF5Constants.H5P_DEFAULT,
                      strs);

                                                  if (block_indx == 10) {
                                                      t1 =
               System.currentTimeMillis();
                                                                    System.out.println("Total time
                             (minutes) = "
                                                              + ((t1 -
        t0) *
               (DIMS[0] /
                                BLOCK_SIZE)) / 1000
                                                              / 600);
                                                  }

                                                  block_indx++;
                                                  start_idx = i;
                                              }

                                          }

                                      } finally {
                                          try {
                                              H5.H5Sclose(fsid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Sclose(msid);
                                          } catch (HDF5Exception ex) {
                                          }
                                          try {
                                              H5.H5Dclose(did);
                                          } catch (HDF5Exception ex) {
                                          }
                                      }
                                  }

                                  private void createFile() throws
        Exception {
                                      int fid = -1;

                                      fid = H5.H5Fcreate(H5_FILE,
                      HDF5Constants.H5F_ACC_TRUNC,

                                              HDF5Constants.H5P_DEFAULT,
                             HDF5Constants.H5P_DEFAULT);

                                      if (fid < 0)
                                          return;

                                      try {
                                          createDataset(fid);
                                          writeData(fid);
                                      } finally {
                                          H5.H5Fclose(fid);
                                      }
                                  }

                                  /**
                                   * @param args
                                   */
                                  public static void main(String[] args) {
                                      try {
                                          (new
        CreateStrings()).createFile();
                                      } catch (Exception ex) {
                                          ex.printStackTrace();
                                      }
                                  }

                                }
                                =========================

                                       _______________________________________________
                                Hdf-forum is for HDF software users
        discussion.
                                Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>>
                             <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>>>

                                                            http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                                                  ------------------------------------------------------------------------

                                    _______________________________________________
                             Hdf-forum is for HDF software users
        discussion.
                             Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>>
                                                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
                                                     _______________________________________________
                         Hdf-forum is for HDF software users discussion.
                         Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                      <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>>
                                              http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                      -- Håkon Sagehaug, Scientific Programmer
                      Parallab, Uni BCCS/Uni Research
                      Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>
                      <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>> <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>
               <mailto:Hakon.Sagehaug@uni.no
        <mailto:Hakon.Sagehaug@uni.no>>>>,

                      phone +47 55584125

                                    ------------------------------------------------------------------------

                      _______________________________________________
                      Hdf-forum is for HDF software users discussion.
                      Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                                    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
                                       _______________________________________________
                  Hdf-forum is for HDF software users discussion.
                  Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>
               <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org> <mailto:Hdf-forum@hdfgroup.org
        <mailto:Hdf-forum@hdfgroup.org>>>
                                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

                      ------------------------------------------------------------------------

               _______________________________________________
               Hdf-forum is for HDF software users discussion.
               Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                      http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
               
           _______________________________________________
           Hdf-forum is for HDF software users discussion.
           Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        <mailto:Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>>
                  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

        ------------------------------------------------------------------------

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
         
    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org