I just spent some time looking for a command-line tool that shows the
size occupied by each dataset in a file. I didn't find anything. The
most promising candidates were h5stat, h5ls, and h5dump, but it seems
that none of them can provide the information I am looking for.
Is there perhaps a third-party tool for that purpose?
I realize that "size" can be defined in lots of ways, but I don't
really care about the details. I have lots of files that each contain
hundreds of datasets, of which most are small but a few are very big.
I am looking for a simple way to identify the big ones. My ideal
definition of size is "how much smaller would the file be if dataset X
were not in there".
Konrad.
···
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research AT khinsen DOT fastmail DOT net http://dirac.cnrs-orleans.fr/~hinsen/
ORCID: http://orcid.org/0000-0003-0330-9428
Twitter: @khinsen
---------------------------------------------------------------------
"h5dump -p" gives you a per dataset storage_layout information which contains the SIZE and OFFSET of the dataset. I always use it with "-H" command so that it just prints the header of the HDF5 file. For example:
h5dump -pH sample_dataset.h5
Hope this helps,
Babak
···
On 11/05/2013 08:31 AM, Konrad Hinsen wrote:
Hi everyone,
I just spent some time looking for a command-line tool that shows the
size occupied by each dataset in a file. I didn't find anything. The
most promising candidates were h5stat, h5ls, and h5dump, but it seems
that none of them can provide the information I am looking for.
Is there perhaps a third-party tool for that purpose?
I realize that "size" can be defined in lots of ways, but I don't
really care about the details. I have lots of files that each contain
hundreds of datasets, of which most are small but a few are very big.
I am looking for a simple way to identify the big ones. My ideal
definition of size is "how much smaller would the file be if dataset X
were not in there".
h5dump with the -p option may give you what you want Combining it with -H will rmove the data from the output, or add -d to limit the output to a specific dataset or -g for a specific group. The output for a dataset looks like this:
I just spent some time looking for a command-line tool that shows the size occupied by each dataset in a file. I didn't find anything. The most promising candidates were h5stat, h5ls, and h5dump, but it seems that none of them can provide the information I am looking for.
Is there perhaps a third-party tool for that purpose?
I realize that "size" can be defined in lots of ways, but I don't really care about the details. I have lots of files that each contain hundreds of datasets, of which most are small but a few are very big.
I am looking for a simple way to identify the big ones. My ideal definition of size is "how much smaller would the file be if dataset X were not in there".
Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans Synchrotron Soleil - Division Expériences Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research AT khinsen DOT fastmail DOT net http://dirac.cnrs-orleans.fr/~hinsen/
ORCID: http://orcid.org/0000-0003-0330-9428
Twitter: @khinsen
---------------------------------------------------------------------
"h5ls -v" provides an estimate of the datasets in a file. Is not this what you are looking for?
Greetings, Richard
···
On 11/05/2013 03:31 PM, Konrad Hinsen wrote:
Hi everyone,
I just spent some time looking for a command-line tool that shows the
size occupied by each dataset in a file. I didn't find anything. The
most promising candidates were h5stat, h5ls, and h5dump, but it seems
that none of them can provide the information I am looking for.
Is there perhaps a third-party tool for that purpose?
I realize that "size" can be defined in lots of ways, but I don't
really care about the details. I have lots of files that each contain
hundreds of datasets, of which most are small but a few are very big.
I am looking for a simple way to identify the big ones. My ideal
definition of size is "how much smaller would the file be if dataset X
were not in there".
> "h5ls -v" provides an estimate of the datasets in a file. Is not this
> what you are looking for?
Babak Behzad writes:
> "h5dump -p" gives you a per dataset storage_layout information which
> contains the SIZE and OFFSET of the dataset. I always use it with "-H"
> command so that it just prints the header of the HDF5 file. For example:
Larry Knox writes:
> h5dump with the -p option may give you what you want Combining it
> with -H will rmove the data from the output, or add -d to limit the
Thanks to all of you for these suggestions. Both h5ls -v and h5dump -p
provide the information about the size of the dataset, with h5ls -v
providing more detailed information (allocated size plus real usage).
Unfortunately, both produce tons of other output, requiring serious
postprocessing for extracting just the size information for a large
number of datasets in a large number of files.
Konrad.
···
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research AT khinsen DOT fastmail DOT net http://dirac.cnrs-orleans.fr/~hinsen/
ORCID: http://orcid.org/0000-0003-0330-9428
Twitter: @khinsen
---------------------------------------------------------------------
> "h5ls -v" provides an estimate of the datasets in a file. Is not this
> what you are looking for?
Babak Behzad writes:
> "h5dump -p" gives you a per dataset storage_layout information which
> contains the SIZE and OFFSET of the dataset. I always use it with "-H"
> command so that it just prints the header of the HDF5 file. For example:
Larry Knox writes:
> h5dump with the -p option may give you what you want Combining it
> with -H will rmove the data from the output, or add -d to limit the
Thanks to all of you for these suggestions. Both h5ls -v and h5dump -p
provide the information about the size of the dataset, with h5ls -v
providing more detailed information (allocated size plus real usage).
Unfortunately, both produce tons of other output, requiring serious
postprocessing for extracting just the size information for a large
number of datasets in a large number of files.
Konrad.
Hi Konrad,
It is likely still not what you want, but maybe my suggestions below are of any help:
[omit "-r" in case your files do not have any groups]