hdf5 compression problem

Our instrument control software uses hdf5 files to store neutron acquisition
data files.
When the size of the "data" group is growing, we have random compressions.
Sometimes the dataset is compressed, sometimes not. Here is the dump of two
files containing the same dataset but with different resulting compression:

Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
   ATTRIBUTE "HDF5_Version" {
      DATATYPE H5T_STRING {
            STRSIZE 5;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE SCALAR
   }
   GROUP "entry0" {
      ATTRIBUTE "NX_class" {
         DATATYPE H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE SCALAR
      }
      GROUP "data" {
         ATTRIBUTE "NX_class" {
            DATATYPE H5T_STRING {
                  STRSIZE 6;
                  STRPAD H5T_STR_NULLTERM;
                  CSET H5T_CSET_ASCII;
                  CTYPE H5T_C_S1;
               }
            DATASPACE SCALAR
         }
         DATASET "data" {
            DATATYPE H5T_STD_I32LE
            DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
            STORAGE_LAYOUT {
               CHUNKED ( 384, 256, 1024 )
               SIZE 402653184 (1.000:1 COMPRESSION)
             }
            FILTERS {
               COMPRESSION DEFLATE { LEVEL 6 }
            }
            FILLVALUE {
               FILL_TIME H5D_FILL_TIME_IFSET
               VALUE 0
            }
            ALLOCATION_TIME {
               H5D_ALLOC_TIME_INCR
            }
            ATTRIBUTE "signal" {
               DATATYPE H5T_STD_I32LE
               DATASPACE SCALAR
            }
         }
      }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
   ATTRIBUTE "HDF5_Version" {
      DATATYPE H5T_STRING {
            STRSIZE 5;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE SCALAR
   }
   GROUP "entry0" {
      ATTRIBUTE "NX_class" {
         DATATYPE H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE SCALAR
      }
      GROUP "data" {
         ATTRIBUTE "NX_class" {
            DATATYPE H5T_STRING {
                  STRSIZE 6;
                  STRPAD H5T_STR_NULLTERM;
                  CSET H5T_CSET_ASCII;
                  CTYPE H5T_C_S1;
               }
            DATASPACE SCALAR
         }
         DATASET "data" {
            DATATYPE H5T_STD_I32LE
            DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
            STORAGE_LAYOUT {
               CHUNKED ( 384, 256, 1024 )
               SIZE 139221680 (2.892:1 COMPRESSION)
             }
            FILTERS {
               COMPRESSION DEFLATE { LEVEL 6 }
            }
            FILLVALUE {
               FILL_TIME H5D_FILL_TIME_IFSET
               VALUE 0
            }
            ALLOCATION_TIME {
               H5D_ALLOC_TIME_INCR
            }
            ATTRIBUTE "signal" {
               DATATYPE H5T_STD_I32LE
               DATASPACE SCALAR
            }
         }
      }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Hmm. I thought that HDF5 library was generally 'smart' about compressing
and would abort a compression if the resultant 'compressed' bytestream
turned out to be larger than the UNcompressed byte stream. For small
datasets, say less than a few hundred characters, I supposed its highly
likely that 'compression' in fact turns out to be a bit larger.

Those two behaviours could explain why some of your datasets wind up
compressed and others not.

Best I can explain given the short time I've thought about the answer :wink:

Mark

···

On Tue, 2012-11-13 at 07:59 -0800, ylegoc wrote:

Our instrument control software uses hdf5 files to store neutron acquisition
data files.
When the size of the "data" group is growing, we have random compressions.
Sometimes the dataset is compressed, sometimes not. Here is the dump of two
files containing the same dataset but with different resulting compression:

Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
   ATTRIBUTE "HDF5_Version" {
      DATATYPE H5T_STRING {
            STRSIZE 5;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE SCALAR
   }
   GROUP "entry0" {
      ATTRIBUTE "NX_class" {
         DATATYPE H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE SCALAR
      }
      GROUP "data" {
         ATTRIBUTE "NX_class" {
            DATATYPE H5T_STRING {
                  STRSIZE 6;
                  STRPAD H5T_STR_NULLTERM;
                  CSET H5T_CSET_ASCII;
                  CTYPE H5T_C_S1;
               }
            DATASPACE SCALAR
         }
         DATASET "data" {
            DATATYPE H5T_STD_I32LE
            DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
            STORAGE_LAYOUT {
               CHUNKED ( 384, 256, 1024 )
               SIZE 402653184 (1.000:1 COMPRESSION)
             }
            FILTERS {
               COMPRESSION DEFLATE { LEVEL 6 }
            }
            FILLVALUE {
               FILL_TIME H5D_FILL_TIME_IFSET
               VALUE 0
            }
            ALLOCATION_TIME {
               H5D_ALLOC_TIME_INCR
            }
            ATTRIBUTE "signal" {
               DATATYPE H5T_STD_I32LE
               DATASPACE SCALAR
            }
         }
      }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
   ATTRIBUTE "HDF5_Version" {
      DATATYPE H5T_STRING {
            STRSIZE 5;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE SCALAR
   }
   GROUP "entry0" {
      ATTRIBUTE "NX_class" {
         DATATYPE H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE SCALAR
      }
      GROUP "data" {
         ATTRIBUTE "NX_class" {
            DATATYPE H5T_STRING {
                  STRSIZE 6;
                  STRPAD H5T_STR_NULLTERM;
                  CSET H5T_CSET_ASCII;
                  CTYPE H5T_C_S1;
               }
            DATASPACE SCALAR
         }
         DATASET "data" {
            DATATYPE H5T_STD_I32LE
            DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
            STORAGE_LAYOUT {
               CHUNKED ( 384, 256, 1024 )
               SIZE 139221680 (2.892:1 COMPRESSION)
             }
            FILTERS {
               COMPRESSION DEFLATE { LEVEL 6 }
            }
            FILLVALUE {
               FILL_TIME H5D_FILL_TIME_IFSET
               VALUE 0
            }
            ALLOCATION_TIME {
               H5D_ALLOC_TIME_INCR
            }
            ATTRIBUTE "signal" {
               DATATYPE H5T_STD_I32LE
               DATASPACE SCALAR
            }
         }
      }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
!!!!!!!!!!!!!!!!!!LLNL BUSINESS ONLY!!!!!!!!!!!!!!!!!!
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Hello,

Would it be possible for you to send us an example that demonstrates the problem? Could you please also send those two files to help@hdfgroup.org?

It will also help if we know how many datasets you have in a data group when you see such behavior. Which version of the gzip library are you using? Which OS and compiler? Have you tried your application with the latest HDF5?

Thank you!

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Nov 13, 2012, at 9:59 AM, ylegoc wrote:

Our instrument control software uses hdf5 files to store neutron acquisition
data files.
When the size of the "data" group is growing, we have random compressions.
Sometimes the dataset is compressed, sometimes not. Here is the dump of two
files containing the same dataset but with different resulting compression:

Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 402653184 (1.000:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 139221680 (2.892:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hmm. I thought that HDF5 library was generally 'smart' about compressing
and would abort a compression if the resultant 'compressed' bytestream
turned out to be larger than the UNcompressed byte stream. For small
datasets, say less than a few hundred characters, I supposed its highly
likely that 'compression' in fact turns out to be a bit larger.

Those two behaviours could explain why some of your datasets wind up
compressed and others not.

  Yup. That's very likely part of what's going on.

    Quincey

···

On Nov 13, 2012, at 11:51 AM, Mark Miller <miller86@llnl.gov> wrote:

Best I can explain given the short time I've thought about the answer :wink:

Mark

On Tue, 2012-11-13 at 07:59 -0800, ylegoc wrote:

Our instrument control software uses hdf5 files to store neutron acquisition
data files.
When the size of the "data" group is growing, we have random compressions.
Sometimes the dataset is compressed, sometimes not. Here is the dump of two
files containing the same dataset but with different resulting compression:

Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 402653184 (1.000:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 139221680 (2.892:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
!!!!!!!!!!!!!!!!!!LLNL BUSINESS ONLY!!!!!!!!!!!!!!!!!!
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

I have explored the problem and now am able to give an explanation. So I do not have to send you the files :slight_smile:

I traced the code and found after a bunch of printf that HDF5 was failing to allocate the buffer for compression. For the biggest files (datasets with 384 x 256 x 1024 values), the error occurs in the function H5Z_filter_deflate:
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "unable to allocate deflate destination buffer") is executed.

However this error is not logged when I use the standard way by defining an error handler:

herr_t error_handler(hid_t err_stack, void *unused) {
     H5Eprint1(stderr); return 0;
}
H5Eset_auto(H5E_DEFAULT, error_handler, NULL);

How could I have this error properly logged? (the reason seems to come from the fact that H5_IS_API(H5Z_filter_deflate) = 0)
This is very important to me since the execution is ok, but the only consequence is that the compression is skipped and the written file is big.

After I discovered the error, I realized that I was using a single chunk (we are not experienced HDF users, and we had no problem in the past with HDF4).
Thus I tested the writing with different chunk sizes on my PC (Intel Xeon CPU E5530 @ 2.40GHz 4 cores, 4G RAM, Linux SLED 11) for a file containing a dataset of size 384 x 256 x 1200.

Here are my results:

Chunk Global HDF5 HDF5 Write time
size max memory max memory memory at end (seconds)

dataset size 43% 19% 0% 92
2 ^ 20 30% 6% 0% 64
2 ^ 15 30% 6% 0% 24
2 ^ 12 45% 21% ~8% 22
2 ^ 10 overflow

(Note: the HDF5 memory footprints are deduced from global memory footprints because the system uses 24% when no HDF5 write is done and 1.5% at the end of the write)

For the sizes 2 ^ 20 and 2 ^ 15, why is the memory footprint the same?
For the size 2 ^ 12, things become instable, the memory peak is greater than for 2 ^ 15 and 2 ^ 20 and the memory at end seems to reveal a leak.
For the size 2 ^ 10, things become worse and the program exits unexpectedly due to a memory overflow.

In my opinion, the writing should remain stable for any chunk size that can be allocated and the writing failure for 2 ^ 10 is a problem.
The writing algorithm should be scalable for small data chunk sizes.

Am I missing something?

However those results showed me that there is an optimal data chunk size but it seems to be really system/hardware-dependent.
Do we have to calibrate the optimal data chunk size for each system/hardware?

···

On 11/14/2012 05:06 PM, Elena Pourmal wrote:

Hello,

Would it be possible for you to send us an example that demonstrates the problem? Could you please also send those two files to help@hdfgroup.org <mailto:help@hdfgroup.org>?

It will also help if we know how many datasets you have in a data group when you see such behavior. Which version of the gzip library are you using? Which OS and compiler? Have you tried your application with the latest HDF5?

Thank you!

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Nov 13, 2012, at 9:59 AM, ylegoc wrote:

Our instrument control software uses hdf5 files to store neutron acquisition
data files.
When the size of the "data" group is growing, we have random compressions.
Sometimes the dataset is compressed, sometimes not. Here is the dump of two
files containing the same dataset but with different resulting compression:

Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 402653184 (1.000:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE SCALAR
        }
        DATASET "data" {
           DATATYPE H5T_STD_I32LE
           DATASPACE SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 139221680 (2.892:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE 0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE H5T_STD_I32LE
              DATASPACE SCALAR
           }
        }
     }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.html
Sent from the hdf-forum mailing list archive at Nabble.com <http://Nabble.com>.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Unfortunately, I think that is not the good explanation.
The bad file size is 386M.
The correct file size is 135M.
So I think they cannot be considered as small datasets. Moreover, what is
strange is that the two files contain the same data, produced sequentially
using a simulation mode (for real data, the compressed file is about 10M).
That is why I'm talking about random or non-deterministic compression.

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575p4025587.html
Sent from the hdf-forum mailing list archive at Nabble.com.

We save datasets of size 384 x 256 x channels with channels variying from
small values to 1024.
For channels <= 512, the compression is ok.
For channels = 1024, the compression is randomly bad.

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575p4025588.html
Sent from the hdf-forum mailing list archive at Nabble.com.