Compression and chunk size issue

syriusz · May 9, 2018, 9:05am

Hi All,

My tests with parallel HDF5 compression feature show that I have to set a chunk size which is exactly integer divider of the number of “elements” which I want to write to the given dataset.
Is it a HDF5 library constraint or is there any workaround for this?

Let me explain on example:

Let’s say we have 1-D dataset (vector) with 1200 elements to be written into it.
Now when I don’t use standard GZIP (H5Pset_deflate) compression, everything is fine.
When I set chunk size (H5Pset_chunk) to some integer divider of 1200 (so: 1, 2, 4, …, 400, 600, 1200) - also everything is fine and the given dataset is compressed.
But when I try to set chunk to some non-integer divider of 1200 (let’s say 500), simply assuming that the last chunk will just not be fulfilled with data - the library reports errors during data write:

H5Dwrite(): can’t prepare for writing data
H5D__pre_write(): can’t write data
H5D__chunk_collective_write(): write error
H5D__chunk_collective_io(): couldn’t finish filtered linked chunk MPI-IO
H5D__link_chunk_filtered_collective_io(): couldn’t process chunk entry
H5D__filtered_collective_chunk_entry_io(): couldn’t unfilter chunk for modifying
…

Best regards,
Rafal

koziol · May 10, 2018, 3:27am

Hi Rafal,
What you are doing is supposed to work. Can you please send a short example program that demonstrates the failure? (Also, which version of the library are you using?)

Quincey

syriusz · May 10, 2018, 11:41am

Hi,

Please take a look at attached simple program which should create HDF file with single “/test” dataset.

test.cpp (1.7 KB)

There are currently two problems:

This program (unmodified) cannot be run for more then 1 process
When I try to run it by for example 4 processes (calling: mpirun -n 4 ./test) I have “segmentation fault”.
This error seems to be generated by “H5Pset_chunk” and/or “H5Pset_deflate” functions because when I comment them out (lines number 32 and 33) - everything runs fine for 4 processes and dataset is properly created (but not compressed of course).
This program (unmodified) produces correct file with compressed dataset inside only for “CHUNK_SIZE” value which is integer divider of “DATASET_SIZE” value - otherwise (try to set CHUNK_SIZE to let’s say 500) we have errors described before in my previous post.

Both issues tested with the latest HDF5 library version 1.10.2 (parallel).

I would be grateful for any help with both issues.

Best regards,
Rafal

koziol · May 10, 2018, 3:00pm

Hi Rafal,
Your program looks valid. I would suggest submitting it to help@hdfgroup.org as a bug.

	Quincey

syriusz · May 11, 2018, 6:39am

Hi Quincey,

Sure, I will sent a possible bug report on email you suggested, but could you please confirm both issues appears on your side too? If you have not run the program yet, could you please be so kind and test it in the way I’ve described and tell me what results you’ve got?

Best regards,
Rafal

koziol · May 11, 2018, 5:43pm

Hi Rafal,
Yes, your test code fails for me in the same way as you are seeing currently.

	Quincey

epourmal · May 21, 2018, 12:19am

Hi Rafal and Quincey,

For your reference we have entered an issue in our JIRA database HDFFV-10470.

Thank you for reporting!

Elena

syriusz · May 21, 2018, 8:57am

Hi Elena,

Thank you for this information.
Can I track this issue somewhere on the web or it is just internal JIRA system for your purposes?
Could you please inform me about any update on this issue?

Thank you.

Regards,
Rafal

epourmal · May 24, 2018, 12:50am

Hi Rafal,

Our JIRA is internal now, but we have been working on opening it to public.
And, yes, we will contact you when we have a fix.

Thank you!

Elena

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Compression and chunk size issue