Read a compressed (deflate) block using Java

Hi all,
  I'm working on a project that deals with HDF4 files provided by NASA.
This is part of a big project in University of Minnesota (
http://spatialhadoop.cs.umn.edu/) that uses Hadoop as a distributed
environment to process large data including satellite data. I used to use
HDFJava wrappers to deal with HDF files and it worked OK. However, since it
depends on native libraries, I frequently have troubles distributing the
code over a heterogeneous cluster of different architectures and systems.
So, I decided to write my own Java-native code to open HDF files.
Currently, I'm just writing the minimal code to support the files I'm
working on.

I followed the specifications provided on the website and I was
successfully able to get the structure of the HDF file and reach the data
blocks that contain the main data (temperature in my case). These blocks
are compressed using deflate with level 1 as indicated by the file.
However, when I try to decompress these blocks, they do not give me the
expected size. I feel that there's some kind of incompatible decompression
parameters that I need to fix. Here's the code I wrote to decompress the
file

    byte[] compressedData = readRawData(in); // Reads the raw data from the
block
    DeflaterInputStream dis = new DeflaterInputStream(new
ByteArrayInputStream(compressedData), new Deflater(1, true)); // Creates a
decompressor
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); // A
temporary place to store decompressed data
    // This loop reads the uncompressed data and caches it in memory
    byte[] buffer = new byte[4096];
    int bufferLength;
    while ((bufferLength = dis.read(buffer)) > 0) {
      baos.write(buffer, 0, bufferLength);
    }
    dis.close();
    baos.close();
    // Retrieve the decompressed data
    uncompressedData = baos.toByteArray();

The header of the compressed data indicate that the size after
decompression should be 2880000 bytes, but my code finds that it is
1613097, which is totally different. When creating the deflater, I tried to
set the 'nowrap' parameter to true or false but both do not work. The same
file works with HDFJava libraries so it should be a bug in my code.

Thanks in advance for your help. Please let me know if you need more
details about this issue.

Best regards,
Ahmed Eldawy

Sorry. My bad. I had to use InflaterInputStream instead of
DeflaterInputStream. What I was doing was simply recompressing the data
that is already compressed. It works fine now.

Best regards,
Ahmed Eldawy

···

On Wed, Mar 11, 2015 at 5:36 PM, Ahmed Eldawy <aseldawy@gmail.com> wrote:

Hi all,
  I'm working on a project that deals with HDF4 files provided by NASA.
This is part of a big project in University of Minnesota (
http://spatialhadoop.cs.umn.edu/) that uses Hadoop as a distributed
environment to process large data including satellite data. I used to use
HDFJava wrappers to deal with HDF files and it worked OK. However, since it
depends on native libraries, I frequently have troubles distributing the
code over a heterogeneous cluster of different architectures and systems.
So, I decided to write my own Java-native code to open HDF files.
Currently, I'm just writing the minimal code to support the files I'm
working on.

I followed the specifications provided on the website and I was
successfully able to get the structure of the HDF file and reach the data
blocks that contain the main data (temperature in my case). These blocks
are compressed using deflate with level 1 as indicated by the file.
However, when I try to decompress these blocks, they do not give me the
expected size. I feel that there's some kind of incompatible decompression
parameters that I need to fix. Here's the code I wrote to decompress the
file

    byte[] compressedData = readRawData(in); // Reads the raw data from
the block
    DeflaterInputStream dis = new DeflaterInputStream(new
ByteArrayInputStream(compressedData), new Deflater(1, true)); // Creates a
decompressor
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); // A
temporary place to store decompressed data
    // This loop reads the uncompressed data and caches it in memory
    byte[] buffer = new byte[4096];
    int bufferLength;
    while ((bufferLength = dis.read(buffer)) > 0) {
      baos.write(buffer, 0, bufferLength);
    }
    dis.close();
    baos.close();
    // Retrieve the decompressed data
    uncompressedData = baos.toByteArray();

The header of the compressed data indicate that the size after
decompression should be 2880000 bytes, but my code finds that it is
1613097, which is totally different. When creating the deflater, I tried to
set the 'nowrap' parameter to true or false but both do not work. The same
file works with HDFJava libraries so it should be a bug in my code.

Thanks in advance for your help. Please let me know if you need more
details about this issue.

Best regards,
Ahmed Eldawy

Hi Ahmed,

I understand that you found the problem, but I just want to mention to you about the HDF4 Mapping tool: http://www.hdfgroup.org/projects/h4map/, which might be useful to your applications. This tool will give you a "map" that describes the structure of an HDF file.

Binh-Minh

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Ahmed Eldawy <aseldawy@gmail.com>
Sent: Wednesday, March 11, 2015 6:47 PM
To: hdf-forum
Subject: Re: [Hdf-forum] Read a compressed (deflate) block using Java

Sorry. My bad. I had to use InflaterInputStream instead of DeflaterInputStream. What I was doing was simply recompressing the data that is already compressed. It works fine now.

Best regards,
Ahmed Eldawy

On Wed, Mar 11, 2015 at 5:36 PM, Ahmed Eldawy <aseldawy@gmail.com<mailto:aseldawy@gmail.com>> wrote:
Hi all,
  I'm working on a project that deals with HDF4 files provided by NASA. This is part of a big project in University of Minnesota (http://spatialhadoop.cs.umn.edu/) that uses Hadoop as a distributed environment to process large data including satellite data. I used to use HDFJava wrappers to deal with HDF files and it worked OK. However, since it depends on native libraries, I frequently have troubles distributing the code over a heterogeneous cluster of different architectures and systems. So, I decided to write my own Java-native code to open HDF files. Currently, I'm just writing the minimal code to support the files I'm working on.

I followed the specifications provided on the website and I was successfully able to get the structure of the HDF file and reach the data blocks that contain the main data (temperature in my case). These blocks are compressed using deflate with level 1 as indicated by the file. However, when I try to decompress these blocks, they do not give me the expected size. I feel that there's some kind of incompatible decompression parameters that I need to fix. Here's the code I wrote to decompress the file

    byte[] compressedData = readRawData(in); // Reads the raw data from the block
    DeflaterInputStream dis = new DeflaterInputStream(new ByteArrayInputStream(compressedData), new Deflater(1, true)); // Creates a decompressor
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); // A temporary place to store decompressed data
    // This loop reads the uncompressed data and caches it in memory
    byte[] buffer = new byte[4096];
    int bufferLength;
    while ((bufferLength = dis.read(buffer)) > 0) {
      baos.write(buffer, 0, bufferLength);
    }
    dis.close();
    baos.close();
    // Retrieve the decompressed data
    uncompressedData = baos.toByteArray();

The header of the compressed data indicate that the size after decompression should be 2880000 bytes, but my code finds that it is 1613097, which is totally different. When creating the deflater, I tried to set the 'nowrap' parameter to true or false but both do not work. The same file works with HDFJava libraries so it should be a bug in my code.

Thanks in advance for your help. Please let me know if you need more details about this issue.

Best regards,
Ahmed Eldawy