HDF and large files

First, a confessional: I just joined the list. I'm not an HDF5 expert.
I'm using HDF5 1.6.6, not the latest. So if this is old news or already
fixed, I apologize in advance.

I work on the IBM Bluegene MPI library. I just investigated a
customer-reported problem using H5Dwrite() on large files on many (32 bit
) nodes. It turned out to be related to work we're doing on
mpich2-dev@mcs.anl.gov to support large files on 32 bit versions of
MPICH2/ROMIO. ROMIO supports large files and supports 64 bit offsets and
file lengths, but sometimes it uses 32 bit variables - particularly when
the typedef "MPI_Aint" is a 32 bit integer. So we're modifying the 32
bit MPICH2/ROMIO libraries to use 64 bit MPI_Aint's (and a variety of
other internal changes) on Bluegene.

HDF5 itself uses MPI_Aint's with MPICH2/ROMIO. I recompiled HDF5 1.6.6
with our in-development library using 64 bit MPI_Aint's and the 32 bit
MPICH/ROMIO library. The customer problem went away. I tested 6G - 130G
on 8 - 1024 nodes. Seems fine.

I didn't examine HDF5 code very closely, but the recompile with 64 bit
MPI_Aint's did fix code in H5Smpio.c like:

displacement[2] = (MPI_Aint)elmt_size * max_xtent[i]; was caculated to be
0xa0000000 = 0x68 * 0x4000000 which should be = 0x1a0000000.

So what I really wanted to tell you was that I did notice at least one
place where an MPI_Aint is being cast to (int). That's a potential
problem not only with our new library but also with the existing 64 bit
MPICH2/ROMIO library on LP64 machines where MPI_Aint will be 64 bit and
integers will be 32 bit.

That's all. Thanks.

Bob Cernohous: (T/L 553) 507-253-6093

BobC@us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829

···

Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.

Hi Bob,

First, a confessional: I just joined the list. I'm not an HDF5 expert. I'm using HDF5 1.6.6, not the latest. So if this is old news or already fixed, I apologize in advance.

I work on the IBM Bluegene MPI library. I just investigated a customer-reported problem using H5Dwrite() on large files on many (32 bit ) nodes. It turned out to be related to work we're doing on mpich2-dev@mcs.anl.gov to support large files on 32 bit versions of MPICH2/ROMIO. ROMIO supports large files and supports 64 bit offsets and file lengths, but sometimes it uses 32 bit variables - particularly when the typedef "MPI_Aint" is a 32 bit integer. So we're modifying the 32 bit MPICH2/ROMIO libraries to use 64 bit MPI_Aint's (and a variety of other internal changes) on Bluegene.

HDF5 itself uses MPI_Aint's with MPICH2/ROMIO. I recompiled HDF5 1.6.6 with our in-development library using 64 bit MPI_Aint's and the 32 bit MPICH/ROMIO library. The customer problem went away. I tested 6G - 130G on 8 - 1024 nodes. Seems fine.

I didn't examine HDF5 code very closely, but the recompile with 64 bit MPI_Aint's did fix code in H5Smpio.c like:

displacement[2] = (MPI_Aint)elmt_size * max_xtent[i]; was caculated to be 0xa0000000 = 0x68 * 0x4000000 which should be = 0x1a0000000.

So what I really wanted to tell you was that I did notice at least one place where an MPI_Aint is being cast to (int). That's a potential problem not only with our new library but also with the existing 64 bit MPICH2/ROMIO library on LP64 machines where MPI_Aint will be 64 bit and integers will be 32 bit.

  Hmm, we could work on fixing this if we have a test case that we can work with. Do you have a test case/environment we can use?

    Quincey

···

On Apr 2, 2008, at 10:44 AM, Bob Cernohous wrote:

That's all. Thanks.

Bob Cernohous: (T/L 553) 507-253-6093

BobC@us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829

> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

> So what I really wanted to tell you was that I did notice at least
> one place where an MPI_Aint is being cast to (int). That's a
> potential problem not only with our new library but also with the
> existing 64 bit MPICH2/ROMIO library on LP64 machines where MPI_Aint
> will be 64 bit and integers will be 32 bit.

   Hmm, we could work on fixing this if we have a test case that we can
work with. Do you have a test case/environment we can use?

I can't send in the customer's testcase. But I looked through your
shipped example code and if you crank ph5example up to create a 7.7G file,
then it fails in dataset_vrfy. I believe this is the same problem that I
saw. It calculates 32 bit MPI_Aint/offsets/indices wrong and writes to
the wrong location, so read/verify fails.

This fails with a 32 bit MPI_Aint/MPICH/ROMIO library. It works with our
in-development 64 bit MPI_Aint/32 bit MPICH/ROMIO library. It probably
works with a pure 64 bit MPICH/ROMIO library - I can't test that.

I don't have a testcase that shows any problem with (MPI_Aint)->(int)
casts in HDF5. I just thought it was a potential problem with both 32
bit/64 bit libraries.

I'm attaching the patch for ph5example that I used. Basically up the
dims from 24x24 to 32000x32000 and run on 32 nodes.

Bob Cernohous: (T/L 553) 507-253-6093

BobC@us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829

ph5example.patch (8.12 KB)

···

Quincey Koziol <koziol@hdfgroup.org> wrote on 04/02/2008 03:10:02 PM: > On Apr 2, 2008, at 10:44 AM, Bob Cernohous wrote:

Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.

Hi Bob,

> > So what I really wanted to tell you was that I did notice at least
> > one place where an MPI_Aint is being cast to (int). That's a
> > potential problem not only with our new library but also with the
> > existing 64 bit MPICH2/ROMIO library on LP64 machines where MPI_Aint
> > will be 64 bit and integers will be 32 bit.
>
> Hmm, we could work on fixing this if we have a test case that we can
> work with. Do you have a test case/environment we can use?
>

I can't send in the customer's testcase. But I looked through your shipped example code and if you crank ph5example up to create a 7.7G file, then it fails in dataset_vrfy. I believe this is the same problem that I saw. It calculates 32 bit MPI_Aint/offsets/indices wrong and writes to the wrong location, so read/verify fails.

This fails with a 32 bit MPI_Aint/MPICH/ROMIO library. It works with our in-development 64 bit MPI_Aint/32 bit MPICH/ROMIO library. It probably works with a pure 64 bit MPICH/ROMIO library - I can't test that.

I don't have a testcase that shows any problem with (MPI_Aint)->(int) casts in HDF5. I just thought it was a potential problem with both 32 bit/64 bit libraries.

I'm attaching the patch for ph5example that I used. Basically up the dims from 24x24 to 32000x32000 and run on 32 nodes.

  Cool, thanks! I'll file it in our bug database and try to get it on the bugfix schedule.

    Quincey

···

On Apr 7, 2008, at 3:55 PM, Bob Cernohous wrote:

Quincey Koziol <koziol@hdfgroup.org> wrote on 04/02/2008 03:10:02 PM: > > On Apr 2, 2008, at 10:44 AM, Bob Cernohous wrote:

Bob Cernohous: (T/L 553) 507-253-6093

BobC@us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829

> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
<ph5example.patch>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi HDF users/ support,

I am using HDF1.6.6 (older) C++ Version.
Basically my issue is that a file doesnt get created if i make use of
H5F_ACC_CREAT.

This below function makes use of 4 File Open Flags: H5F_ACC_CREAT,
H5F_ACC_EXCL, |H5F_ACC_TRUNC, H5F_ACC_DEBUG
File: H5File.cpp:

···

---------------------
void H5File::p_get_file(const char* name, unsigned int flags, const
FileCreatPropList& create_plist, const FileAccPropList& access_plist)
{
if( flags & (H5F_ACC_CREAT|H5F_ACC_EXCL|H5F_ACC_TRUNC|H5F_ACC_DEBUG))
    {
        hid_t create_plist_id = create_plist.getId();
        hid_t access_plist_id = access_plist.getId();
        id = H5Fcreate( name, flags, create_plist_id, access_plist_id );
--
--
--
}

However, below function - doesnt have any usage of Flag:H5F_ACC_CREAT and
hence throws an Exception.

File: H5F.c
------------
H5Fcreate(const char *filename, unsigned flags, hid_t fcpl_id, hid_t
fapl_id)
{
    H5F_t *new_file = NULL; /*file struct for new file */
    hid_t ret_value; /*return value */

    FUNC_ENTER_API(H5Fcreate, FAIL)
    H5TRACE4("i","sIuii",filename,flags,fcpl_id,fapl_id);

    /* Check/fix arguments */
    if (!filename || !*filename)
        HGOTO_ERROR(H5E_ARGS, H5E_BADVALUE, FAIL, "invalid file name")
    if (flags & ~(H5F_ACC_EXCL|H5F_ACC_TRUNC|H5F_ACC_DEBUG))
        HGOTO_ERROR(H5E_ARGS, H5E_BADVALUE, FAIL, "invalid flags")
    if ((flags & H5F_ACC_EXCL) && (flags & H5F_ACC_TRUNC))
        HGOTO_ERROR (H5E_ARGS, H5E_BADVALUE, FAIL, "mutually exclusive
flags for file creation")

Thanks
Anish

Hi Anish,

Hi HDF users/ support,

I am using HDF1.6.6 (older) C++ Version.
Basically my issue is that a file doesnt get created if i make use of H5F_ACC_CREAT.

  Hmm, it does look like the H5F_ACC_CREAT flag isn't being allowed correctly. Since the application is calling H5Fcreate(), passing the H5F_ACC_CREAT flag is redundant, but I agree that it should be allowed anyway.

  I'll file a bug report for this.

  Thanks,
    Quincey

···

On Apr 10, 2008, at 1:44 PM, Anish Anto wrote:

This below function makes use of 4 File Open Flags: H5F_ACC_CREAT, H5F_ACC_EXCL, |H5F_ACC_TRUNC, H5F_ACC_DEBUG
File: H5File.cpp:
---------------------
void H5File::p_get_file(const char* name, unsigned int flags, const FileCreatPropList& create_plist, const FileAccPropList& access_plist)
{
if( flags & (H5F_ACC_CREAT|H5F_ACC_EXCL|H5F_ACC_TRUNC|H5F_ACC_DEBUG))
    {
        hid_t create_plist_id = create_plist.getId();
        hid_t access_plist_id = access_plist.getId();
        id = H5Fcreate( name, flags, create_plist_id, access_plist_id );
--
}

However, below function - doesnt have any usage of Flag:H5F_ACC_CREAT and hence throws an Exception.

File: H5F.c
------------
H5Fcreate(const char *filename, unsigned flags, hid_t fcpl_id, hid_t fapl_id)
{
    H5F_t *new_file = NULL; /*file struct for new file */
    hid_t ret_value; /*return value */

    FUNC_ENTER_API(H5Fcreate, FAIL)
    H5TRACE4("i","sIuii",filename,flags,fcpl_id,fapl_id);

    /* Check/fix arguments */
    if (!filename || !*filename)
        HGOTO_ERROR(H5E_ARGS, H5E_BADVALUE, FAIL, "invalid file name")
    if (flags & ~(H5F_ACC_EXCL|H5F_ACC_TRUNC|H5F_ACC_DEBUG))
        HGOTO_ERROR(H5E_ARGS, H5E_BADVALUE, FAIL, "invalid flags")
    if ((flags & H5F_ACC_EXCL) && (flags & H5F_ACC_TRUNC))
        HGOTO_ERROR (H5E_ARGS, H5E_BADVALUE, FAIL, "mutually exclusive flags for file creation")

Thanks
Anish

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.