Read data from a lot of datasets?

tony · September 9, 2009, 1:43pm

Hi all,

     There are thousands of datasets, and each datasets is about 200M, I want to read parts of data from each datasets.According to your experiences, which is better the following two methods? (Non parellel)
    Method1. Open a dataset, and read, and close the datasets, and then open the next dataset, read, and close.
    Method2. Open all the datasets, read , close all?

    Or any other good features of HDF5 can achieve this?

Thanks in advance.
tony

Quincey_Koziol · September 9, 2009, 6:49pm

Hi Tony,

···

On Sep 9, 2009, at 8:43 AM, tony wrote:

Hi all,

     There are thousands of datasets, and each datasets is about 200M, I want to read parts of data from each datasets.According to your experiences, which is better the following two methods? (Non parellel)
    Method1. Open a dataset, and read, and close the datasets, and then open the next dataset, read, and close.
    Method2. Open all the datasets, read , close all?

    Or any other good features of HDF5 can achieve this?

Each dataset that is open has a certain amount of memory it needs, so if there are really huge numbers of datasets, you may have to use Method 1. Otherwise, if you are repeatedly reading pieces of each dataset, Method 2 would work better. If you aren't revisiting datasets, then go for Method 1 though.

Quincey

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Read data from a lot of datasets?