Hi, I am a bit disorientated, I am using h5py to read datasets from a hierarchical multi group h5 file
First I implemented a NN using jupyter-notebook on google colab, then I decided to move most of the stuff inside custom libraries and only run a commend with arguments from jupyter notebook in colab but the run was much slower!
In both cases, the library reading the datasets on the hdf5 files are custom libraries (.py files called from jupyter)
The only difference is that in one case I loop through each epoch and batch from jupyter, in the new (and much slower implementation) I call custom libraries for looping through epochs and batches, that’s all.
In the last case, the code is much slower. I tracked carefully the origin of the delay and it is when reading and managing datasets in h5py
For instance, the following code
def _get_signal_window(self, with_labels=False):
if (self.get_number_of_avail_windows() == 0):
self._reset_random_wind()
sample = self._get_sample()
Cnp = sample[0]
Duration = sample[1]
Dnp = sample[2]
window_number = sample[3]
# >>>>>>>>>>>>> HERE IS THE DIFFERENCE
dset = self.File['Cnp_' + str(Cnp+1) + '/Duration_' + str(Duration+1) + '/Dnp_' + str(Dnp+1) + '/data']
assert dset.shape[1] % self.length == 0
samples_per_second = int(dset.shape[1] / self.length)
samples_per_window = int(samples_per_second * self.window)
begin = window_number * samples_per_window
end = begin + samples_per_window
time_window = torch.Tensor(dset[0,begin:end]).to(self.device)
clean_signal = torch.Tensor(dset[1,begin:end]).to(self.device)
noisy_signal = torch.Tensor(dset[2,begin:end]).to(self.device)
if with_labels:
starts, widths, amplitudes, categories, number_of_pulses, average_width, average_amplitude = self._get_labels(time_window, Cnp, Duration, Dnp)
# >>>>>>>>>>>>> HERE IS THE DIFFERENCE
return time_window, clean_signal, noisy_signal, starts, widths, amplitudes, categories, number_of_pulses, average_width, average_amplitude
else:
return time_window, clean_signal, noisy_signal
runs at least 6 times faster when all the code in in jupyter than when I use custom libraries (remember this code above is in a custom library in both cases)
The code between the tags is the one which makes the difference and is identical in both cases
I can share repo link and jupyter files too
What is happening here?