Random HDF5 container/dataset generator

Greetings!
I would like to share this HDF5 utility project to generates an HDF5 container with random layout and content. In addition to its main function, with graphviz it can visualise the generated prufer sequence based tree.

The MIT licensed project is meant to be used in a hackish way, customisation, pull requests, etc are highly encouraged.

best wishes: steven

2 Likes

‘Prüfer’ also happens to mean ‘tester’ in German. And what a tester this is! A real pressure cooker for the HDF5 library :exploding_head:

2 Likes

If using h5rnd to “stress test” a UI, for example, long data object names, high rank, large dimensions, deeply nested datasets, etc. would be very useful. (In fact, it’s anything whose representation impacts the geometry of canvas allocations.)

Would it be possible to include an illustrative HDF5 file in the repo for a quick look at the generated content?

Maybe this an area that would require some customization. But it seems more promising than having relatively static generators, or scouring the internet for “good” HDF5 files.

Thanks for the interest! I updated the h5rnd repository with the output for:

  • HDF5 container listing with h5ls -r tree.h5
  • SVG image, the output of dot see manual for details
  • graphviz file, driving dot renderer: dot -Tsvg tree.gv -o tree.svg

enjoy!

tree

HDF5 container h5ls -r tree.h5

/                        Group
/AkWTsmZmlndfbRhKWywY    Dataset {1000}
/luSQtFEs                Group
/luSQtFEs/EcdhmBaFpNoplYIqUGHuVZj Group
/luSQtFEs/EcdhmBaFpNoplYIqUGHuVZj/ORmqQLNnWtDSQhfexAyrczVolGPY Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv Group
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/BtuejM Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/YSdbCOyP Group
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/YSdbCOyP/DNrnKgcxo Group
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/YSdbCOyP/DNrnKgcxo/CudEjqyaFtBGqcrFenMUhA Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/YSdbCOyP/VDorTtsiEIwtZ Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/YSdbCOyP/fuLsrYEurcvdOXWickWKFXLyMGY Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/qyHzilzoCyZlTaMcqZQ Dataset {1000}
/luSQtFEs/pOnCmrTKREyihDZBHmDyJDpiLv/wfXkhrapzAB Dataset {1000}
/luSQtFEs/yGnBRCmSxyrPv  Group
/luSQtFEs/yGnBRCmSxyrPv/bZPYDqZmhSXwunlyCwGMLu Group
/luSQtFEs/yGnBRCmSxyrPv/bZPYDqZmhSXwunlyCwGMLu/PnHPAPgjWqfkhBHeR Dataset {1000}
/luSQtFEs/yGnBRCmSxyrPv/bZPYDqZmhSXwunlyCwGMLu/QmNJYWKrrEXsqVxDAqkFQRPVHwaD Dataset {1000}
/luSQtFEs/yGnBRCmSxyrPv/rtaEOCMRrqKzAcRBEHQjFQ Dataset {1000}
/snWTGjtKQva             Dataset {1000}
/zzxgLfDdtDtViDnSLtrEzVpYKVZzr Dataset {1000}

graphviz: cat tree.gv sets defaults for datasets, then lists the folders, use dot directed graph for rendering, see: Makefile for details

digraph prufer {
	node [shape=note color=orange fontcolor=purple fontname="times bold" fillcolor=violet width=0.01 height=0.1 fontsize=9.0 labelloc=b style=bold];
	1 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	2 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	3 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	4 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	5 [shape=cylinder color=purple fillcolor=orange style=filled label=HDF5 weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	6 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	7 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]
	8 [shape=folder color=purple weight=2.0 width=0.75 height=0.5 fontsize=14.0 labelloc=c]

6 -> 0;
2 -> 9;
1 -> 2;
4 -> 10;
3 -> 4;
7 -> 11;
1 -> 12;
6 -> 13;
8 -> 14;
8 -> 15;
7 -> 8;
3 -> 7;
5 -> 16;
5 -> 17;
6 -> 18;
1 -> 19;
6 -> 1;
3 -> 6;
5 -> 3;
5 -> 20;
}

This is excellent. That helped a lot!

image

Would it be a reasonable extension to increase the diversity of the datatypes, and randomize array dimensions and dataset ranks? Also perhaps chunked datasets?

If so, I can consider a fork. (It would be an Windows/VS2017 build, as I’m set up for only Windows platforms.)