H5Fcreate when using multiple nodes and multiple processors per node

I'm using the example code here:
http://www.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c

When I run this code with 8 processes on separate nodes, it works fine.

When I run this code with 8 processes on the same node (ppn=8), it works
fine.

When I run this code on 2 nodes with 4 processes per node (ppn=4), I get a
segmentation fault in H5Fcreate.

Any help or guidance would be much appreciated.

This works usually fine.
How did you compile the library and with which MPI library?

Cheers,

Matthieu

···

2013/10/17 Aaron Friesz <friesz@usc.edu>:

I'm using the example code here:
http://www.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c

When I run this code with 8 processes on separate nodes, it works fine.

When I run this code with 8 processes on the same node (ppn=8), it works
fine.

When I run this code on 2 nodes with 4 processes per node (ppn=4), I get a
segmentation fault in H5Fcreate.

Any help or guidance would be much appreciated.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

I would consider HDF to be pretty silly library if this were expected
behavior. :slight_smile: Clearly there is something wrong with how I have it set up or
how the cluster is reacting to it. I just have no idea what it could be.

We use MPICH-MX <http://www.myricom.com/support/downloads/mx/mpich-mx.html>.

I compile using GNU 3.4.3. The system uses PVFS.

My compile is fairly simple:

#!/bin/tcsh

make clean

#must make sure mpicc is in the path first
setenv CC "mpicc"

./configure --prefix=[installDir] |tee hdf5cfg.log

make |tee make.log
make install prefix=[installDir] |tee make_install.log

Again, any help or advice on fixing this is greatly appreciated.

···

On Fri, Oct 18, 2013 at 5:59 AM, Matthieu Brucher < matthieu.brucher@gmail.com> wrote:

This works usually fine.
How did you compile the library and with which MPI library?

Cheers,

Matthieu

2013/10/17 Aaron Friesz <friesz@usc.edu>:
> I'm using the example code here:
> http://www.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c
>
> When I run this code with 8 processes on separate nodes, it works fine.
>
> When I run this code with 8 processes on the same node (ppn=8), it works
> fine.
>
> When I run this code on 2 nodes with 4 processes per node (ppn=4), I get
a
> segmentation fault in H5Fcreate.
>
> Any help or guidance would be much appreciated.
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Aaron,

This is not expected behavior.

Please provide your configure & make output.
What version of Mpich is your myricom implementation based off?

Also please try this program and make sure that it works on all different ppn distributions:
http://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/parallel/Sample_mpio.c

Thanks,
Mohamad

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Aaron Friesz
Sent: Friday, October 18, 2013 10:29 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate when using multiple nodes and multiple processors per node

I would consider HDF to be pretty silly library if this were expected behavior. :slight_smile: Clearly there is something wrong with how I have it set up or how the cluster is reacting to it. I just have no idea what it could be.

We use MPICH-MX<http://www.myricom.com/support/downloads/mx/mpich-mx.html>.

I compile using GNU 3.4.3. The system uses PVFS.

My compile is fairly simple:

#!/bin/tcsh

make clean

#must make sure mpicc is in the path first
setenv CC "mpicc"

./configure --prefix=[installDir] |tee hdf5cfg.log

make |tee make.log
make install prefix=[installDir] |tee make_install.log

Again, any help or advice on fixing this is greatly appreciated.

On Fri, Oct 18, 2013 at 5:59 AM, Matthieu Brucher <matthieu.brucher@gmail.com<mailto:matthieu.brucher@gmail.com>> wrote:
This works usually fine.
How did you compile the library and with which MPI library?

Cheers,

Matthieu

2013/10/17 Aaron Friesz <friesz@usc.edu<mailto:friesz@usc.edu>>:

I'm using the example code here:
http://www.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c

When I run this code with 8 processes on separate nodes, it works fine.

When I run this code with 8 processes on the same node (ppn=8), it works
fine.

When I run this code on 2 nodes with 4 processes per node (ppn=4), I get a
segmentation fault in H5Fcreate.

Any help or guidance would be much appreciated.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Mohamad,

How would I find the MPICH version for myricom? If there is a way to query
the system, I'll do that tomorrow when I re-run the mpio test (I've run it
before, but I don't know that I tried different ppn vs. node distributions).

Also for the configure and make output, can I simply send those as an
attachment with a reply email?

···

On Fri, Oct 18, 2013 at 9:19 PM, Mohamad Chaarawi <chaarawi@hdfgroup.org>wrote:

Aaron,****

** **

This is not expected behavior. ****

** **

Please provide your configure & make output.****

What version of Mpich is your myricom implementation based off?****

** **

Also please try this program and make sure that it works on all different
ppn distributions:****

http://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/parallel/Sample_mpio.c
****

** **

Thanks,****

Mohamad****

** **

** **

*From:* Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Aaron Friesz
*Sent:* Friday, October 18, 2013 10:29 PM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] H5Fcreate when using multiple nodes and
multiple processors per node****

** **

I would consider HDF to be pretty silly library if this were expected
behavior. :slight_smile: Clearly there is something wrong with how I have it set up or
how the cluster is reacting to it. I just have no idea what it could be.*
***

** **

We use MPICH-MX<http://www.myricom.com/support/downloads/mx/mpich-mx.html>
.****

** **

I compile using GNU 3.4.3. The system uses PVFS.****

** **

My compile is fairly simple: ****

** **

#!/bin/tcsh****

** **

make clean****

** **

#must make sure mpicc is in the path first****

setenv CC "mpicc"****

** **

./configure --prefix=[installDir] |tee hdf5cfg.log****

** **

make |tee make.log****

make install prefix=[installDir] |tee make_install.log****

** **

Again, any help or advice on fixing this is greatly appreciated.****

** **

On Fri, Oct 18, 2013 at 5:59 AM, Matthieu Brucher <
matthieu.brucher@gmail.com> wrote:****

This works usually fine.
How did you compile the library and with which MPI library?

Cheers,

Matthieu

2013/10/17 Aaron Friesz <friesz@usc.edu>:****

> I'm using the example code here:
> http://www.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c
>
> When I run this code with 8 processes on separate nodes, it works fine.
>
> When I run this code with 8 processes on the same node (ppn=8), it works
> fine.
>
> When I run this code on 2 nodes with 4 processes per node (ppn=4), I get
a
> segmentation fault in H5Fcreate.
>
> Any help or guidance would be much appreciated.
>
>****

> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>

--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
****

** **

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Inline..

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Aaron Friesz
Sent: Friday, October 18, 2013 11:43 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate when using multiple nodes and multiple processors per node

Mohamad,

How would I find the MPICH version for myricom? If there is a way to query the system, I'll do that tomorrow when I re-run the mpio test (I've run it before, but I don't know that I tried different ppn vs. node distributions).

[msc] maybe mpicc -version :-?

Also for the configure and make output, can I simply send those as an attachment with a reply email?

[msc] Sure.

Mohamad

Mohamad,

Thanks for your help.

When I ran Simple_mpio.c with 2 nodes and 2 ppn, I again got segmentation
faults. It works fine when the processes are all on separate nodes. It
also works when all processes are on the same node.

What could cause this? What do I need to tell the system administrator
needs to be fixed?

I haven't been able to figure out a version number for myricom yet.

···

On Sat, Oct 19, 2013 at 11:08 AM, Mohamad Chaarawi <chaarawi@hdfgroup.org>wrote:

Inline..****

** **

** **

*From:* Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Aaron Friesz
*Sent:* Friday, October 18, 2013 11:43 PM

*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] H5Fcreate when using multiple nodes and
multiple processors per node****

** **

Mohamad,****

** **

How would I find the MPICH version for myricom? If there is a way to
query the system, I'll do that tomorrow when I re-run the mpio test (I've
run it before, but I don't know that I tried different ppn vs. node
distributions).****

** **

[msc] maybe mpicc –version :-?****

** **

Also for the configure and make output, can I simply send those as an
attachment with a reply email?****

** **

[msc] Sure. ****

** **

** **

Mohamad****

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Hi Aaron,

This seams a problem with your MPI library setup with PVFS(2?).
Maybe Rob will glance over this thread and can tell you what your problem is.

The fact where you are saying that your program works with 1 process per node puzzles me. This rules out the problem where the file is not visible from other nodes.
So I am not really sure what the problem here is.

Thanks,
Mohamad

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Aaron Friesz
Sent: Saturday, October 19, 2013 5:12 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate when using multiple nodes and multiple processors per node

Mohamad,

Thanks for your help.

When I ran Simple_mpio.c with 2 nodes and 2 ppn, I again got segmentation faults. It works fine when the processes are all on separate nodes. It also works when all processes are on the same node.

What could cause this? What do I need to tell the system administrator needs to be fixed?

I haven't been able to figure out a version number for myricom yet.

On Sat, Oct 19, 2013 at 11:08 AM, Mohamad Chaarawi <chaarawi@hdfgroup.org<mailto:chaarawi@hdfgroup.org>> wrote:
Inline..

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>] On Behalf Of Aaron Friesz
Sent: Friday, October 18, 2013 11:43 PM

To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate when using multiple nodes and multiple processors per node

Mohamad,

How would I find the MPICH version for myricom? If there is a way to query the system, I'll do that tomorrow when I re-run the mpio test (I've run it before, but I don't know that I tried different ppn vs. node distributions).

[msc] maybe mpicc -version :-?

Also for the configure and make output, can I simply send those as an attachment with a reply email?

[msc] Sure.

Mohamad

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

I gathered from your reply that my results were strange so I went back and
tested again. It seems I messed up the first time and ran it twice on one
node with 4 ppn.

It does indeed fail when all the process are on separate nodes (H5Fcreate
only failed in the mixed case). Does this mean something specific?

···

On Sat, Oct 19, 2013 at 5:03 PM, Mohamad Chaarawi <chaarawi@hdfgroup.org>wrote:

Hi Aaron,****

** **

This seams a problem with your MPI library setup with PVFS(2?).****

Maybe Rob will glance over this thread and can tell you what your problem
is.****

** **

The fact where you are saying that your program works with 1 process per
node puzzles me. This rules out the problem where the file is not visible
from other nodes.****

So I am not really sure what the problem here is.****

** **

Thanks,****

Mohamad****

** **

** **

*From:* Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Aaron Friesz
*Sent:* Saturday, October 19, 2013 5:12 PM

*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] H5Fcreate when using multiple nodes and
multiple processors per node****

** **

Mohamad,****

** **

Thanks for your help. ****

** **

When I ran Simple_mpio.c with 2 nodes and 2 ppn, I again got segmentation
faults. It works fine when the processes are all on separate nodes. It
also works when all processes are on the same node.****

** **

What could cause this? What do I need to tell the system administrator
needs to be fixed?****

** **

I haven't been able to figure out a version number for myricom yet.****

** **

On Sat, Oct 19, 2013 at 11:08 AM, Mohamad Chaarawi <chaarawi@hdfgroup.org>
wrote:****

Inline..****

****

****

*From:* Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Aaron Friesz
*Sent:* Friday, October 18, 2013 11:43 PM****

*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] H5Fcreate when using multiple nodes and
multiple processors per node****

****

Mohamad,****

****

How would I find the MPICH version for myricom? If there is a way to
query the system, I'll do that tomorrow when I re-run the mpio test (I've
run it before, but I don't know that I tried different ppn vs. node
distributions).****

****

[msc] maybe mpicc –version :-?****

****

Also for the configure and make output, can I simply send those as an
attachment with a reply email?****

****

[msc] Sure. ****

****

****

Mohamad****

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
****

** **

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org