Hi, I am a bit disorientated, I am using h5py to read datasets from a hierarchical multi group h5 file
First I implemented a NN using jupyter-notebook on google colab, then I decided to move most of the stuff inside custom libraries and only run a commend with arguments from jupyter notebook in colab but the run was much slower!
In both cases, the library reading the datasets on the hdf5 files are custom libraries (.py files called from jupyter)
The only difference is that in one case I loop through each epoch and batch from jupyter, in the new (and much slower implementation) I call custom libraries for looping through epochs and batches, that’s all.
In the last case, the code is much slower. I tracked carefully the origin of the delay and it is when reading and managing datasets in h5py
For instance, the following code
def _get_signal_window(self, with_labels=False): if (self.get_number_of_avail_windows() == 0): self._reset_random_wind() sample = self._get_sample() Cnp = sample Duration = sample Dnp = sample window_number = sample # >>>>>>>>>>>>> HERE IS THE DIFFERENCE dset = self.File['Cnp_' + str(Cnp+1) + '/Duration_' + str(Duration+1) + '/Dnp_' + str(Dnp+1) + '/data'] assert dset.shape % self.length == 0 samples_per_second = int(dset.shape / self.length) samples_per_window = int(samples_per_second * self.window) begin = window_number * samples_per_window end = begin + samples_per_window time_window = torch.Tensor(dset[0,begin:end]).to(self.device) clean_signal = torch.Tensor(dset[1,begin:end]).to(self.device) noisy_signal = torch.Tensor(dset[2,begin:end]).to(self.device) if with_labels: starts, widths, amplitudes, categories, number_of_pulses, average_width, average_amplitude = self._get_labels(time_window, Cnp, Duration, Dnp) # >>>>>>>>>>>>> HERE IS THE DIFFERENCE return time_window, clean_signal, noisy_signal, starts, widths, amplitudes, categories, number_of_pulses, average_width, average_amplitude else: return time_window, clean_signal, noisy_signal
runs at least 6 times faster when all the code in in jupyter than when I use custom libraries (remember this code above is in a custom library in both cases)
The code between the tags is the one which makes the difference and is identical in both cases
I can share repo link and jupyter files too
What is happening here?