Reading multiple hdf5 files mounted on remote folder

Hi Folks!
I already reported this bug in the h5py google group and they brought me here.(https://groups.google.com/forum/?utm_source=digest&utm_medium=email#!topic/h5py/bipjHxRvqAM)

I am having the exact same problem reported in: https://stackoverflow.com/questions/54930250/how-to-use-h5py-to-access-multiple-hdf5-files-stored-on-google-team-drives-the?noredirect=1&lq=1
I didn’t manage to find any solution yet for this problem, has this bug been addressed?
I am using h5py v2.10.0 on Windows 10, and I have a mounted folder via sshfs.
Do you have any suggestions?

Thank you in advance!

In your example,

h5file1 = h5py.File(filename1, 'r')
h5file2 = h5py.File(filename2, 'r')
print(f'{h5file1}: datasets = {list(h5file1.keys())}')
print(f'{h5file2}: datasets = {list(h5file2.keys())}')
h5file1.close()
h5file2.close()

what do you see if you print the FileIDs:

h5file1 = h5py.File(filename1, 'r')
h5file2 = h5py.File(filename2, 'r')
print(f'{h5file1.id}: name = {h5file1.id.name}')
print(f'{h5file2.id}: name = {h5file2.id.name}')
h5file1.close()
h5file2.close()

?

G.

Thank you Gheber for your reply.

Results in two different FileIDs, the name of the files are the correct ones:

<h5py.h5f.FileID object at 0x000001E6C33129A0>: name = b’Z:/FileName1.hdf5’
<h5py.h5f.FileID object at 0x000001E6C3312F90>: name = b’Z:/FileName2.hdf5’

If I instead print the datasets, they correspond both to the ones of FileName1.hdf5.

I can confirm that this problem is happening only in Python with h5py, if I repeat the same exact test (same files, mounted folder, etc.) with Matlab I get the expected behaviour.

Thanks @minching for trying this. Just to be 100% sure. What happens if the two files reside in a directory on a local drive such as C:? That, presumably, works fine, which means there’s something weird going on here. Presumably h5py gets the file names via H5Fget_name on the IDs, which seems to suggest that the library is not confused as far as file handles go, and maybe a few wires are crossed in h5py? G.

Thank you @gheber. Yes, if the two files are in the local drive this doesn’t happen. But in my case I need to work continuesly with multilple GB size files residing on supercomputers, moving them to the local drive everytime is quite time and disk space consuming.

Do you feel comfortable to write a little C program that goes throught the same exercise? That way we could eliminate Python as a source and get a new data point. (The fact that MATLAB doesn’t show the problem suggests that the C program will behave just fine, but we won’t know until we try.) If you can’t, I’ll write a little test this afternoon and pass it on. G.

Yes thank you, that would help.

two_files.c (1.5 KB)
Compile the attached file with h5cc -o foo two_files.c, and then run foo or foo.exe. The expected output is this:

% ./foo 
One open file:
file1.h5 -> data1
file2.h5 -> data2
Two open files:
file1.h5 -> data1
file2.h5 -> data2

G.

yes I confirm that I obtain the followings:

One open file:
file1.h5 -> data1
file2.h5 -> data2
Two open files:
file1.h5 -> data1
file2.h5 -> data2

Please note that I had to change H5L_info1_t* to H5L_info_t* (line 8) to make the script compile correctly.

Thanks for trying this. It appears that, at least for this simple example, the C-library can tell the two files and objects apart. It’s too simple though to conclude that the problem is with h5py. Maybe we have to dig a little deeper into your setup. Can you describe in a little more detail your sshfs / Google drive setup? Are you using Cygwin or WSL? G.

1 Like

Thank you.
I set up my sshfs mounted folder following exactly this tutorial:
https://www.youtube.com/watch?v=uiXOuxdadms
I am using windows 10 Home, build version 10.0.18363.
I haven’t tried with google drive, the post on stackoverflow I linked above it’s not mine, but perfectly resembles my problem, so I guess you could use that setup to reproduce more easily the problem.

Let me know if you need any more information.

Hi, do you find solution for your problem?
I have the same error in the same condition with c++ hdf5:
One open file:
file1.h5 -> data1
file2.h5 -> data2
Two open files:
file1.h5 -> data1
file2.h5 -> data1