Hi,
I'm planning on using hdf5 for storing data on large image datasets
(1M+ images). I'd ideally like to store the data from each image as a
separate dataset in an hdf5 file. When I experimented previously with
having a flat hierarchy (i.e. each image linked to the root group), it
seemed hdf5 became extremely slow at performing almost any operation
(iteration,etc). When I queried this on the PyTables mailing list the
consensus seemed to be that using a hierarchy so that the number of
nodes in a group didn't exceed 256 was required to maintain acceptable
speeds (with v 1.6.5). At the time I didn't follow this suggestion up,
but i'm wondering now with hdf5 1.8.0 whether this situation has
changed at all?
Does anyone else have any experience with storing this large number of datasets?
Many thanks,
James
These were the c programs I used to stress test 1.6.5 (creation was
fairly fast but iteration was >1sec per dataset):
--- hdf5_stress_test.c ---
#include <stdio.h>
#include <stdlib.h>
#include "H5LT.h"
int
main(void)
{
hid_t file_id;
hsize_t dims[2];
int data[256];
char dset_name[32];
herr_t status;
int i;
int total = 1000000;
dims[0] = 16;
dims[1] = 16;
file_id = H5Fcreate("hdf5_stress_test.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
for (i=0; i<total; ++i) {
sprintf(dset_name, "/dset_%07d", i);
status = H5LTmake_dataset(file_id, dset_name, 2, dims,
H5T_NATIVE_INT, data);
if (!(i%1000))
printf("\r[%07d/%07d]", i, total);
fflush(stdout);
}
status = H5Fclose(file_id);
return 0;
}
--- ---
--- hdf5_iterate.c ---
#include "hdf5.h"
herr_t file_info(hid_t loc_id, const char *name, void *opdata);
int
main(void)
{
hid_t file;
hid_t dataset;
hid_t group;
file = H5Fopen("hdf5_stress_test.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
H5Giterate(file, "/", NULL, file_info, NULL);
H5Fclose(file);
return 0;
}
herr_t file_info(hid_t loc_id, const char *name, void *opdata)
{
H5G_stat_t statbuf;
/*
* Get type of the object and display its name and type.
* The name of the object is passed to this function by
* the Library. Some magic
*/
H5Gget_objinfo(loc_id, name, 0, &statbuf);
switch (statbuf.type) {
case H5G_GROUP:
printf(" Object with name %s is a group \n", name);
break;
case H5G_DATASET:
printf(" Object with name %s is a dataset \n", name);
break;
case H5G_TYPE:
printf(" Object with name %s is a named datatype \n", name);
break;
default:
printf(" Unable to identify an object ");
}
return 0;
}
--- ---
···
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.