Simplified geospatial data representation?

I’m looking for recommendations / suggestions to use HDF5 format better for large geospatial datasets related to SAR imagery. Realize that this may not be the most relevant forum but probably a good idea to bring it up anyway. This is a followup to a discussion on the gdal-dev thread:

After a fair bit of research, it looks HDF-EOS5 and kealib seem to be available options. Both formats have rigid group naming schemes and HDF-EOS5 seems like a particularly tough format to work with and has many limitations including

  1. Complex number support - its easy to do this with compound datatypes like h5py and GDAL 2.3+ supports this.

  2. Support for N-bit ints / floats - numpy supports float16 and GDAL also supports this (transforms to nearest native format on read).

  3. Unnecessarily complicated way of supporting projections and coordinate systems.

I have been using HDF5 to work with large data cubes and have no problems letting GDAL do the manipulation by writing VRTs that point to array slices.

However, I’m wondering if others have thought of simpler representation of geospatial metadata (just like geotiff) for HDF5. Now that GDAL has added virtual file system support for HDF5 - it makes it a really attractive format for datasets that are typically broken down into a large number of geotiff files.

One could just use a simple geotransform (array of 6 doubles) and EPSG code (integer) as attributes to any 2D / 3D array with minimal overhead. This would circumvent the need for adding an extra layer of VRTs to interact with these datasets in GIS.

Any thoughts or suggestions?


For 1) and 2), it’s good to add new data types (complex, float16) for HDF5.

We are not in a position to improve the HDF-EOS5 library. However, there is a netCDF-CF way that may work for you. See: