Hello Hdf5 community, Quincey
I have tested 1.8.16 and 1.10.1 versions, also with h5pset_libver_bounds_f
subroutine
I have inserted these commands in my bench program
call h5open_f(error)
call h5pcreate_f( H5P_FILE_ACCESS_F, fapl_id, error)
call h5pset_libver_bounds_f(fapl_id, H5F_LIBVER_LATEST_F,
H5F_LIBVER_LATEST_F, error)
However, I can't see any difference on the size of HDF5 generated files.
Below is the size and md5sum of the generated hdf5 files, with the 2 hdf5
libraries and different number of elements (0,1 and 2) in each dataset
Version 1.8.16
$ ./bench.exe 0 && md5sum results.h5 && ls -altr results.h5
ee8157f1ce74936021b1958fb796741e *results.h5
-rw-r--r-- 1 xxxxx 1049089 1169632 May 24 09:17 results.h5
$ ./bench.exe 1 && md5sum results.h5 && ls -altr results.h5
1790a5650bb945b17c0f8a4e59adec85 *results.h5
-rw-r--r-- 1 xxxxx 1049089 7481632 May 24 09:17 results.h5
$ ./bench.exe 2 && md5sum results.h5 && ls -altr results.h5
7d3dff2c6a1c29fa0fe827e4bd5ba79e *results.h5
-rw-r--r-- 1 xxxxx 1049089 7505632 May 24 09:17 results.h5
Version 1.10.1
$ ./bench.exe 0 && md5sum results.h5 && ls -altr results.h5
ec8169773b9ea015c81fc4cb2205d727 *results.h5
-rw-r--r-- 1 xxxxx 1049089 1169632 May 24 09:12 results.h5
$ ./bench.exe 1 && md5sum results.h5 && ls -altr results.h5
fae64160fe79f4af0ef382fd1790bf76 *results.h5
-rw-r--r-- 1 xxxxx 1049089 7481632 May 24 09:14 results.h5
$ ./bench.exe 2 && md5sum results.h5 && ls -altr results.h5
20aaf160b3d8ab794ab8c14a604dacc5 *results.h5
-rw-r--r-- 1 xxxxx 1049089 7505632 May 24 09:14 results.h5
···
2017-05-23 19:12 GMT+02:00 Guillaume Jacquenot < guillaume.jacquenot@gmail.com>:
Hello Quincey
I am using version 1.8.16
I am using chunk of size 1.
I have tried contiguous dataset, but I have error at runtimeI have written a test program that creates 3000 datasets filled with 64
floating point number.
I can specify the number n, which controls the number of times I saved my
data (the number of timesteps of a simulation in my case)To sum this test program,
call hdf5_init(filename)
do i = 1, n
call hdf5_write(datatosave)
end do
call hdf5_close()With n =0, I have a HDF5 file with size 1.11 Mo, which corresponds to a
370 bytes per empty dataset (Totally reasonnable).
With 1 =0, I have a HDF5 file with size 7.13 Mo, which surprises me. Why
such an increase?
With 2 =0, I have a HDF5 file with size 7.15 Mo, which is leads to an
increase of 0.02 Mo which is logical : 3000*8*1/1e6 =0.024 Mo)When setting chunk size to 10, I obtain the following results
With n =0, I have a HDF5 file with size 1.11 Mo, which corresponds to a
370 bytes per empty dataset.
With 1 =0, I have a HDF5 file with size 7.34 Mo, which surprises me.
With 2 =0, I have a HDF5 file with size 7.15 Mo, which is leads to an
increase of 3000*8*10/1e6, which is logical.I don't understand the first increase of size. It does not make this data
storage very efficient.
Do you think coumpound dataset with 3000 columns will present the same
behaviour? I have not tried since I don't know how to map the content of an
array when calling the h5dwrite_f function for a compound dataset.If I ask 30000 datasets, I observe the same behaviour
n=0 -> 10.9 Mo
n=1 -> 73.2 MoThanks
Here is the error I have with contiguous dataset
#001: hdf5-1.8.16/src/H5Dint.c line 453 in H5D__create_named(): unable
to create and link to dataset
major: Dataset
minor: Unable to initialize object
#002: hdf5-1.8.16/src/H5L.c line 1638 in H5L_link_object(): unable to
create new link to object
major: Links
minor: Unable to initialize object
#003: hdf5-1.8.16/src/H5L.c line 1882 in H5L_create_real(): can't insert
link
major: Symbol table
minor: Unable to insert object
#004: hdf5-1.8.16/src/H5Gtraverse.c line 861 in H5G_traverse():
internal path traversal failed
major: Symbol table
minor: Object not found
#005: hdf5-1.8.16/src/H5Gtraverse.c line 641 in H5G_traverse_real():
traversal operator failed
major: Symbol table
minor: Callback failed
#006: hdf5-1.8.16/src/H5L.c line 1685 in H5L_link_cb(): unable to create
object
major: Object header
minor: Unable to initialize object
#007: hdf5-1.8.16/src/H5O.c line 3016 in H5O_obj_create(): unable to
open object
major: Object header
minor: Can't open object
#008: hdf5-1.8.16/src/H5Doh.c line 293 in H5O__dset_create(): unable to
create dataset
major: Dataset
minor: Unable to initialize object
#009: hdf5-1.8.16/src/H5Dint.c line 1056 in H5D__create(): unable to
construct layout information
major: Dataset
minor: Unable to initialize object
#010: hdf5-1.8.16/src/H5Dcontig.c line 422 in H5D__contig_construct():
extendible contiguous non-external dataset
major: Dataset
minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.8.16) t^C2017-05-23 19:00 GMT+02:00 <hdf-forum-request@lists.hdfgroup.org>:
Date: Tue, 23 May 2017 08:22:59 -0700
From: Quincey Koziol <koziol@lbl.gov>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Questions about size of generated Hdf5 files
Message-ID: <9B8A951B-D2F7-489F-8E60-005C4242E2CF@lbl.gov>
Content-Type: text/plain; charset="utf-8"Hi Guillaume,
Are you using chunked or contiguous datasets? If chunked, what
size are you using? Also, can you use the ?latest? version of the format,
which should be smaller, but is only compatible with HDF5 1.10.x or later?
(i.e. H5Pset_libver_bounds with ?latest? for low and high bounds,
https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm <
https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm> )Quincey
> On May 23, 2017, at 3:02 AM, Guillaume Jacquenot < >> guillaume.jacquenot@gmail.com> wrote:
>
> Hello everyone!
>
> I am creating a HDF5 file from a Fortran program, and I am confused
about the size of my generated HDF5 file.
>
> I am writing 19000 datasets with 21 values of 64 bit (real number).
> I write one value at a time, and extend with one each of the 19000
datasets everytime.
> All data are correctly written.
> But the generated file is more than 48 Mo.
> I expected the total size of the file to be a little bigger than the
raw data, about 3.2Mo (21*19000*8 / 1e6=3.192Mo)
> If I only create 19000 empty datasets, I obtain a 6Mo Hdf5 file, which
means each empty dataset is about 400 bytes.
> I guess I could create a ~10 Mo (6Mo + 3.2Mo) Hdf5 file that can
contain everything.
>
> For comparaison,if I write everything in a text file, where each real
number is written with 15 characters, I obtain a 6 Mo CSV file.
>
> Question 1)
> Is this behaviour normal?
>
> Question 2)
> Does extending dataset each time we write data inside can significantly
increase the total required space disk size?
> Does preallocating dataset and using hyperslab can save some space?
> Does chunk parameters can impact the size of generated hdf5 file
>
> Question 3)
> If I pack everything in a compound dataset with 19000 columns, will the
result file be smaller?
>
> N.B:
> When looking at the example of generating 100000 groups (grplots.c),the
size of the generated HD5 file is 78 Mo for 100000 empty groups
> That means each group is about 780 bytes
> https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c
<https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c>
>
> Guillaume Jacquenot