3D array view issue

philip.lee · November 15, 2022, 9:19am

I can create an array using h5py like this

f.attrs["py-array-3x2x2"] = np.array(
    [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]
)

which looks to me like a 3 x (2 x 2) array, but HDFView displays it as a (3 x 2) x 2 array

and similarly in my own C# code

Is this a bug in HDF-View or something I’m not understanding correctly?

arraytest.h5 (1.4 KB)

byrn · November 15, 2022, 1:57pm

I had to check, but there is not a bug (by definition), more a difference in expectation.
For what ever reason, someone decided that the paging should be with the fastest dimension.
There is a bug that doesn’t allow the user to change the paging/selection of the visual rows/cols and dimensions for attributes, Datasets have the OpenAs option.

File a GitHub issue in the hdfview project and I will try to rectify this for the upcoming releases.

hyoklee · November 15, 2022, 2:52pm

Hi, @philip.lee!

Thank you for reporting an interesting problem regarding attribute!

Can you also measure performance of h5py vs. HDF5_JAVA (HDFView) using a bigger array like [1]?

[1] https://www.maplesoft.com/support/help/maple/view.aspx?path=Fortran_order

byrn · November 15, 2022, 3:02pm

h5py vs HDFView is not a valid comparison.
h5py vs the HDF5 JNI would be valid.

hyoklee · November 15, 2022, 3:08pm

Yes, that’s why I used HDF5_JAVA used in java/CMakeLists.txt.

philip.lee · November 15, 2022, 3:53pm

I can compare array creation speed c# vs python if that’s of any use. I don’t have a java environment.

hyoklee · November 15, 2022, 6:16pm

Thank you Philip for volunteering! If you get an interesting result, please contact @lori.cooper and get it published on blog.hdfgroup.org!

philip.lee · November 16, 2022, 12:29pm

The biggest 2d array of doubles I create for an attribute is 90x90. After that I get error “Unable to create attribute (object header message is too large)”.

A very informal comparison, with both run in release mode using VS 2022 is then:

Python:

f = h5py.File("arraytest.h5", "w")

d = 90
arr = np.random.rand(d,d)

stopwatch = Stopwatch();
stopwatch.start();

for x in range(0, 1000):
    f.attrs["py-array-{0}x{1}-{2}".format(d,d,x)] = arr

stopwatch.stop();
print(stopwatch.elapsed)

Takes 9.3 seconds.

My C# code

Random r = new Random(Environment.TickCount);
int d = 90;

var sw = new Stopwatch();
var array = Enumerable.Range(0, d * d).Select(i => r.NextDouble()).ToArray();
sw.Start();
for (int i = 0; i < 1000; i++)
{
    file.WriteUnmanagedAttribute($"c#-{d}x{d}-{i}", array, new long[] { d, d });
}
Console.WriteLine(sw.ElapsedMilliseconds);

Takes 8.5 seconds

gheber · November 16, 2022, 1:12pm

Good deal @philip.lee

For clarification, that’s an attribute size limitation in the earliest file format spec. (which is the default). 64,800 bytes is pretty close to 64KiB. This restriction was dropped in later attribute layouts.

G.

philip.lee · November 16, 2022, 5:55pm

Thanks, setting library version low to ‘latest’ makes a huge difference.

@hyoklee
Now I get

create 1000 attributes of 100x100 double:
python ~200ms
c# ~110ms

create 1 attribute of 2000x2000 double
python ~ 75ms
c# ~85ms

hyoklee · November 16, 2022, 6:39pm

Great result!

Am I asking too much if you can run read test as well?

Anyway, thank you so much for testing and submitting issues on GitHub as well.

Talented users/developers like you make us happy!

philip.lee · November 16, 2022, 8:26pm

I don’t have read array implemented for C# yet but for h5py I get

1 attribute of 2000x2000 double
create: ~75ms
read: ~30ms

I notice a possible bug in 1.10.6

If I do this (pseudo code-ish):

H5F.create (null acpl)
H5F.set_libver_bounds(file, latest, latest)

Then this doesn’t take effect on the file and creating large attributes fail as before. If I now create a group on the file I can create large attributes on the group.

If instead I do

create acpl
H5P.set_libver_bounds(acpl, latest, latest)
H5F.create (acpl)

Then I can create large attributes on the file.

philip.lee · November 18, 2022, 4:08pm

@hyoklee finally I have

1 attribute of 2000x2000 double
h5py (1.12.2): create: ~75ms, read: ~30ms
c# (1.10.6): create: ~85ms, read: ~38ms

I’m pretty sure Python can’t be faster that C# so I’m putting it down to the version of HDF.

gheber · November 18, 2022, 4:44pm

Yes, we had several performance improvements post-1.10.6 (2019-12-23) vs. 1.12.2 (
2022-04-27). We are preparing HDF.PInvoke 1.10.9, which should be a tad faster.

G.

philip.lee · November 18, 2022, 4:48pm

I’m using the binaries from HDF-Pinvoke, but not using PInvoke itself.
I’m also using some .NET7 features (LibraryImport).
Have a look here if you’re interested: https://github.com/PhilPJL/HDF5.Api/tree/master/HDF5.Api/NativeMethods
and a different loader https://github.com/PhilPJL/HDF5.Api/blob/master/HDF5.Api/NativeMethods/NativeProviderLoader.cs

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

3D array view issue