Difficulty reading VIIRS satellite data using the HDF5 C++ API

Greetings everyone,

I am attempting to use the HDF5 C++ API to read data from the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite. VIIRS Sensor Data Records (SDR) can be downloaded freely from NOAA’s Comprehensive Large Array-Data Stewardship System (CLASS) at this URL: http://www.nsof.class.noaa.gov/saa/products/search?sub_id=0&datatype_family=VIIRS_SDR&submit.x=19&submit.y=3 <http://www.nsof.class.noaa.gov/saa/products/search?sub_id=0&datatype_family=VIIRS_SDR&submit.x=19&submit.y=3>.

The problem I am encountering is that the data is stored in large arrays. Some data are 6144 by 6400 32-bit floating point format. That requires 314,573 kilobytes of memory to load into an array. In C++, I must declare that array in the stack memory, and there is not nearly enough for that size.

Here is a condensed section of c++ code:
const int X = 6144;
const int Y = 6400;
float geoArray[X][Y]
...
dataSet.read(geoArray, PredType::NATIVE_FLOAT, memSpace, dataSpace);

Even when I change my computer’s stack size limit to 65,532 kilobytes or attempt to dynamically allocate the array, neither solution is not enough to prevent a segmentation fault. I am currently using h5dump to create binary files that I can then read into c++ vectors (which allows the data to be stored in the heap rather than the limited stack). Does the dataset read function have any provision or usage to access values individually rather than as a full array? How else can I access such large amounts of data using the HDF5 C++ API?

Thank you and best regards,
Lance

Is there any reason you must allocate that on the stack rather than perform
a dynamic allocation? Many people would say you shouldn't attempt to put
large datastructures (large is something above a couple hundred K) on the
stack. Many OS's stack sizes are in the range of 8K-128K.

Accessing in smaller chunks will not be very efficient as you go towards 1
item at a time but you can take slice of the data to reduce what you are
looking at. Take a look at hyperslab selections maybe as a starting
point. However if this was a reasonable desktop system I would just
allocate this data on the heap and read it in there.

-Jason

···

On Thu, Jun 4, 2015 at 11:24 AM, Lance Steele <lance.steele@wni.com> wrote:

Greetings everyone,

I am attempting to use the HDF5 C++ API to read data from the Visible
Infrared Imaging Radiometer Suite (VIIRS) satellite. VIIRS Sensor Data
Records (SDR) can be downloaded freely from NOAA’s Comprehensive Large
Array-Data Stewardship System (CLASS) at this URL:
http://www.nsof.class.noaa.gov/saa/products/search?sub_id=0&datatype_family=VIIRS_SDR&submit.x=19&submit.y=3
.

The problem I am encountering is that the data is stored in large arrays.
Some data are 6144 by 6400 32-bit floating point format. That requires
314,573 kilobytes of memory to load into an array. In C++, I must declare
that array in the stack memory, and there is not nearly enough for that
size.

Here is a condensed section of c++ code:
const int X = 6144;
const int Y = 6400;
float geoArray[X][Y]
...
dataSet.read(geoArray, PredType::NATIVE_FLOAT, memSpace, dataSpace);

Even when I change my computer’s stack size limit to 65,532 kilobytes or
attempt to dynamically allocate the array, neither solution is not enough
to prevent a segmentation fault. I am currently using h5dump to create
binary files that I can then read into c++ vectors (which allows the data
to be stored in the heap rather than the limited stack). Does the dataset
read function have any provision or usage to access values individually
rather than as a full array? How else can I access such large amounts of
data using the HDF5 C++ API?

Thank you and best regards,
Lance

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hey Lance, why not use a standard C++ vector?

#include <vector>

...

vector<float> geoArray(X*Y);

dataset.read(&geoArray[0], ...)

...

This gives you a 1D array, but then you can just use a macro
or inline function to implement the 2D [i,j] access, if you’d like.

G.

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Lance Steele
Sent: Thursday, June 4, 2015 1:25 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Difficulty reading VIIRS satellite data using the HDF5 C++ API

Greetings everyone,

I am attempting to use the HDF5 C++ API to read data from the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite. VIIRS Sensor Data Records (SDR) can be downloaded freely from NOAA’s Comprehensive Large Array-Data Stewardship System (CLASS) at this URL: http://www.nsof.class.noaa.gov/saa/products/search?sub_id=0&datatype_family=VIIRS_SDR&submit.x=19&submit.y=3.

The problem I am encountering is that the data is stored in large arrays. Some data are 6144 by 6400 32-bit floating point format. That requires 314,573 kilobytes of memory to load into an array. In C++, I must declare that array in the stack memory, and there is not nearly enough for that size.

Here is a condensed section of c++ code:
const int X = 6144;
const int Y = 6400;
float geoArray[X][Y]
...
dataSet.read(geoArray, PredType::NATIVE_FLOAT, memSpace, dataSpace);

Even when I change my computer’s stack size limit to 65,532 kilobytes or attempt to dynamically allocate the array, neither solution is not enough to prevent a segmentation fault. I am currently using h5dump to create binary files that I can then read into c++ vectors (which allows the data to be stored in the heap rather than the limited stack). Does the dataset read function have any provision or usage to access values individually rather than as a full array? How else can I access such large amounts of data using the HDF5 C++ API?

Thank you and best regards,
Lance