Hi.
We have an application that iterates over a set of sequential operations on a single machine.
In each of these steps it stores a large number of small meta data. This is put into different sub groups, and is stored as individual datasets. It also creates a single large image ~48mb.
But due all the http requests for all the small meta data, this part ends of being very time consuming. Even more than sending the large image.
This leads us to the following questions:
Is there any way of sending entire groups in a single request?
Or is the only way to achieve this to create a local file, and then uploading that with hsload, and then linking it into the original file?
Also is it possible to create multiple links in a single http request?
I hope this h5pyd example shows what we are experiencing.
Running with a local posix instance with 6 threads (runall.sh 6), gives me the following timings:
Mean timings datasets 1.738 sek, std: 0.368
Mean timings Im 0.819, std: 0.218
import h5pyd as h5py
import time
import uuid
import numpy as np
class ThingItem:
def init(self, name,age,version,data):
self.name = name
self.age = age
self.version = version
self.data = data
“”"
Storing approaches
“”"
def store(group, items):
for key, val in items.items():
if type(val) == dict:
g = group.require_group(key)
store(g, val)
if type(val) == ThingItem:
group.attrs[“name”] = val.name
group.attrs[“age”] = val.age
group.attrs[“version”] = val.version
else:
group.create_dataset(key, data=val)
totRunningTime = 0
“”"
Creating test data
“”"
child= {}
child[“name”] = “John”
child[“age”] = “32”
child[“adress”] = “some street”
itm = ThingItem(“Jens”, 42,1,child)
things = {}
things[“item1”] = 42
things[“item2”] = “string test”
things[“child1”] = itm
things[“child2”] = itm
things[“child3”] = itm
things[“child4”] = itm
“”"
Running the test
“”"
N = 100
timingsData = np.zeros(N)
timingsIm = np.zeros(N)
for i in range(N):
filename = str(uuid.uuid4()) + “test8.h5”
fqdn = “/home/test_user1/” + filename
print(f"itteration {i} file:"+fqdn)
start = time.time()
with h5py.File(fqdn, "a") as f:
g = f.require_group("/test")
store(g, things)
end = time.time()
timingsData[i] = end-start
print(f"Saving small datasets: {timingsData[i]} sek" )
im = np.random.randint(0,10,size=[6000,4000], dtype=np.int16)
things["im"] = im
start = time.time()
with h5py.File(fqdn, "a") as f:
f["im"] = im
end = time.time()
timingsIm[i] = end-start
print(f"Saving image: {timingsIm[i]} sek" )
print("")
print(f"Mean timings datasets {np.mean(timingsData)}, std: {np.std(timingsData)} ")
print(f"Mean timings Im {np.mean(timingsIm)}, std: {np.std(timingsIm)} ")