Reading multiple hdf5 files mounted on remote folder


#1

Hi Folks!
I already reported this bug in the h5py google group and they brought me here.(https://groups.google.com/forum/?utm_source=digest&utm_medium=email#!topic/h5py/bipjHxRvqAM)

I am having the exact same problem reported in: https://stackoverflow.com/questions/54930250/how-to-use-h5py-to-access-multiple-hdf5-files-stored-on-google-team-drives-the?noredirect=1&lq=1
I didn’t manage to find any solution yet for this problem, has this bug been addressed?
I am using h5py v2.10.0 on Windows 10, and I have a mounted folder via sshfs.
Do you have any suggestions?

Thank you in advance!


#2

In your example,

h5file1 = h5py.File(filename1, 'r')
h5file2 = h5py.File(filename2, 'r')
print(f'{h5file1}: datasets = {list(h5file1.keys())}')
print(f'{h5file2}: datasets = {list(h5file2.keys())}')
h5file1.close()
h5file2.close()

what do you see if you print the FileIDs:

h5file1 = h5py.File(filename1, 'r')
h5file2 = h5py.File(filename2, 'r')
print(f'{h5file1.id}: name = {h5file1.id.name}')
print(f'{h5file2.id}: name = {h5file2.id.name}')
h5file1.close()
h5file2.close()

?

G.


#3

Thank you Gheber for your reply.

Results in two different FileIDs, the name of the files are the correct ones:

<h5py.h5f.FileID object at 0x000001E6C33129A0>: name = b’Z:/FileName1.hdf5’
<h5py.h5f.FileID object at 0x000001E6C3312F90>: name = b’Z:/FileName2.hdf5’

If I instead print the datasets, they correspond both to the ones of FileName1.hdf5.

I can confirm that this problem is happening only in Python with h5py, if I repeat the same exact test (same files, mounted folder, etc.) with Matlab I get the expected behaviour.


#4

Thanks @minching for trying this. Just to be 100% sure. What happens if the two files reside in a directory on a local drive such as C:? That, presumably, works fine, which means there’s something weird going on here. Presumably h5py gets the file names via H5Fget_name on the IDs, which seems to suggest that the library is not confused as far as file handles go, and maybe a few wires are crossed in h5py? G.


#5

Thank you @gheber. Yes, if the two files are in the local drive this doesn’t happen. But in my case I need to work continuesly with multilple GB size files residing on supercomputers, moving them to the local drive everytime is quite time and disk space consuming.


#6

Do you feel comfortable to write a little C program that goes throught the same exercise? That way we could eliminate Python as a source and get a new data point. (The fact that MATLAB doesn’t show the problem suggests that the C program will behave just fine, but we won’t know until we try.) If you can’t, I’ll write a little test this afternoon and pass it on. G.


#7

Yes thank you, that would help.


#8

two_files.c (1.5 KB)
Compile the attached file with h5cc -o foo two_files.c, and then run foo or foo.exe. The expected output is this:

% ./foo 
One open file:
file1.h5 -> data1
file2.h5 -> data2
Two open files:
file1.h5 -> data1
file2.h5 -> data2

G.


#9

yes I confirm that I obtain the followings:

One open file:
file1.h5 -> data1
file2.h5 -> data2
Two open files:
file1.h5 -> data1
file2.h5 -> data2

Please note that I had to change H5L_info1_t* to H5L_info_t* (line 8) to make the script compile correctly.


#10

Thanks for trying this. It appears that, at least for this simple example, the C-library can tell the two files and objects apart. It’s too simple though to conclude that the problem is with h5py. Maybe we have to dig a little deeper into your setup. Can you describe in a little more detail your sshfs / Google drive setup? Are you using Cygwin or WSL? G.


#11

Thank you.
I set up my sshfs mounted folder following exactly this tutorial:
https://www.youtube.com/watch?v=uiXOuxdadms
I am using windows 10 Home, build version 10.0.18363.
I haven’t tried with google drive, the post on stackoverflow I linked above it’s not mine, but perfectly resembles my problem, so I guess you could use that setup to reproduce more easily the problem.

Let me know if you need any more information.