3D array view issue


#1

I can create an array using h5py like this

f.attrs["py-array-3x2x2"] = np.array(
    [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]
)

which looks to me like a 3 x (2 x 2) array, but HDFView displays it as a (3 x 2) x 2 array

image

and similarly in my own C# code

Is this a bug in HDF-View or something I’m not understanding correctly?

arraytest.h5 (1.4 KB)


#2

I had to check, but there is not a bug (by definition), more a difference in expectation.
For what ever reason, someone decided that the paging should be with the fastest dimension.
There is a bug that doesn’t allow the user to change the paging/selection of the visual rows/cols and dimensions for attributes, Datasets have the OpenAs option.

File a GitHub issue in the hdfview project and I will try to rectify this for the upcoming releases.


#3

Hi, @philip.lee!

Thank you for reporting an interesting problem regarding attribute!

Can you also measure performance of h5py vs. HDF5_JAVA (HDFView) using a bigger array like [1]?

[1] https://www.maplesoft.com/support/help/maple/view.aspx?path=Fortran_order


#4

h5py vs HDFView is not a valid comparison.
h5py vs the HDF5 JNI would be valid.


#5

Yes, that’s why I used HDF5_JAVA used in java/CMakeLists.txt.


#6

I can compare array creation speed c# vs python if that’s of any use. I don’t have a java environment.


#7

Thank you Philip for volunteering! If you get an interesting result, please contact @lori.cooper and get it published on blog.hdfgroup.org!


#8

The biggest 2d array of doubles I create for an attribute is 90x90. After that I get error “Unable to create attribute (object header message is too large)”.

A very informal comparison, with both run in release mode using VS 2022 is then:

Python:

f = h5py.File("arraytest.h5", "w")

d = 90
arr = np.random.rand(d,d)

stopwatch = Stopwatch();
stopwatch.start();

for x in range(0, 1000):
    f.attrs["py-array-{0}x{1}-{2}".format(d,d,x)] = arr

stopwatch.stop();
print(stopwatch.elapsed)

Takes 9.3 seconds.

My C# code

Random r = new Random(Environment.TickCount);
int d = 90;

var sw = new Stopwatch();
var array = Enumerable.Range(0, d * d).Select(i => r.NextDouble()).ToArray();
sw.Start();
for (int i = 0; i < 1000; i++)
{
    file.WriteUnmanagedAttribute($"c#-{d}x{d}-{i}", array, new long[] { d, d });
}
Console.WriteLine(sw.ElapsedMilliseconds);

Takes 8.5 seconds


#9

Good deal @philip.lee

For clarification, that’s an attribute size limitation in the earliest file format spec. (which is the default). 64,800 bytes is pretty close to 64KiB. This restriction was dropped in later attribute layouts.

G.


#10

Thanks, setting library version low to ‘latest’ makes a huge difference.

@hyoklee
Now I get

create 1000 attributes of 100x100 double:
python ~200ms
c# ~110ms

create 1 attribute of 2000x2000 double
python ~ 75ms
c# ~85ms


#11

Great result!

Am I asking too much if you can run read test as well? :slight_smile:

Anyway, thank you so much for testing and submitting issues on GitHub as well.

Talented users/developers like you make us happy!


#12

I don’t have read array implemented for C# yet but for h5py I get

1 attribute of 2000x2000 double
create: ~75ms
read: ~30ms

I notice a possible bug in 1.10.6

If I do this (pseudo code-ish):

H5F.create (null acpl)
H5F.set_libver_bounds(file, latest, latest)

Then this doesn’t take effect on the file and creating large attributes fail as before. If I now create a group on the file I can create large attributes on the group.

If instead I do

create acpl
H5P.set_libver_bounds(acpl, latest, latest)
H5F.create (acpl)

Then I can create large attributes on the file.


#13

@hyoklee finally I have

1 attribute of 2000x2000 double
h5py (1.12.2): create: ~75ms, read: ~30ms
c# (1.10.6): create: ~85ms, read: ~38ms

I’m pretty sure Python can’t be faster that C# :slight_smile: so I’m putting it down to the version of HDF.


#14

Yes, we had several performance improvements post-1.10.6 (2019-12-23) vs. 1.12.2 (
2022-04-27). We are preparing HDF.PInvoke 1.10.9, which should be a tad faster.

G.


#15

I’m using the binaries from HDF-Pinvoke, but not using PInvoke itself.
I’m also using some .NET7 features (LibraryImport).
Have a look here if you’re interested: https://github.com/PhilPJL/HDF5.Api/tree/master/HDF5.Api/NativeMethods
and a different loader https://github.com/PhilPJL/HDF5.Api/blob/master/HDF5.Api/NativeMethods/NativeProviderLoader.cs