Following up my own post.
I am fully convinced now that you can't run hdf5's compression code in
parallel, where the number of cores compressing is greater than the number
of files being written. You can only run the compression filters 1:1 to each
compressed file. I wrote up my sequential code and quickly realized that
while I can write it such that each core will compress the data, the
compression itself will happen sequentially, not in parallel. So I end up
essentially with the same inefficiency I had before. i.e.
rank2=2
call MPI_INIT( error )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, error )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, error )
CALL h5open_f(error)
NX=500000
NY=numprocs
allocate ( derps(NX) )
do i=1,NX
call random_number(rand)
derps(i) = 10000 * rand
enddo
dims_full(1) = NX
dims_full(2) = NY
chunkdims(1)=NX
chunkdims(2)=NY
count(1) = NX
count(2) = 1
offset_in(1)=0
offset_in(2)=0
call h5screate_simple_f(rank2,count,subspace_id,error) !partial
call h5screate_simple_f(rank2,dims_full,fullspace_id,error) !full
call h5pcreate_f(H5P_DATASET_CREATE_F,chunk_id,error)
call h5pset_chunk_f(chunk_id,rank2,chunkdims,error)
call h5pset_deflate_f(chunk_id,4,error)
!Loop over ranks, each rank writing to the same file
do i=0,numprocs-1
if (myid.eq.i) then
offset_out(1)=0
offset_out(2)=i
if (myid.eq.0) then
CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id,
error)
else
CALL h5fopen_f(filename, H5F_ACC_RDWR_F, file_id,
error)
endif
if (myid.eq.0) then
CALL h5dcreate_f(file_id, dsetname, H5T_NATIVE_REAL,
fullspace_id, dset_id,error,chunk_id)
else
call h5dopen_f(file_id,dsetname,dset_id,error)
endif
call
h5sselect_hyperslab_f(subspace_id,H5S_SELECT_SET_F,offset_in,count,error)
call
h5sselect_hyperslab_f(fullspace_id,H5S_SELECT_SET_F,offset_out,count,error)
call
h5dwrite_f(dset_id,H5T_NATIVE_REAL,derps,count,error,subspace_id,fullspace_id)
CALL h5dclose_f(dset_id, error)
CALL h5fclose_f(file_id, error)
endif
call mpi_barrier(MPI_COMM_WORLD,error)
enddo
CALL h5sclose_f(subspace_id, error)
CALL h5sclose_f(fullspace_id, error)
CALL h5close_f(error)
call mpi_finalize(rc)
Even though the h5pset**** stuff is outside of the i loop, the compression
filters are apparently triggered during h5dwrite_f - which must be in the i
loop. I presume there is no way to change that behavior in userspace.
I guess what Rob suggested is the only way to go for doing parallel
compression with hdf5. There just doesn't appear to be a way to have each
mpi process doing compression in parallel. I can live with the writing being
sequential, as that happens pretty fast.
···
On Thu, Dec 9, 2010 at 11:52 AM, Leigh Orf <leigh.orf@gmail.com> wrote:
On Thu, Dec 9, 2010 at 11:40 AM, Rob Latham <robl@mcs.anl.gov> wrote:
On Thu, Dec 09, 2010 at 10:57:28AM -0700, Leigh Orf wrote:
> Thanks for the information. After I sent my email I realized I left out
some
> relevant information. I am not using pHDF5 but regular HDF5, but in a
> parallel environment. The only reason I am doing this is because I want
the
> ability to write compressed HDF5 files (gzip, szip, scale-offset, nbit,
> etc.). As I understand it, at this point (and maybe forever) pHDF5
cannot do
> compression.
> I currently have tried two approaches with compression and HDF5 in a
> parallel environment: (1) Each MPI rank writes its own compressed HDF5
file.
> (2) I create a new MPI communicator (call it subcomm) which operates on
a
> sub-block of the entire domain. Each instance of subcomm (which could,
for
> instance, operate on one multicore chip) does a MPI_GATHER to rank 0 of
> subcomm, and that root core does the compression and writes to disk.
What if you still did collective writes with parallel-HDF5, but you
did a little additional work in the application. If you compress each
portion of data on each MPI rank, then ask HDF5 to write out that
compressed buffer, blammo, you get parallel compression and parallel
I/O. It's not as seamless as if you asked HDF5 to do the compression
for you: I guess you'd have to find a stream-based compression
algorithm (gzip?) that can work on concatenated blocks, and annotate
the dataset with the compression algorithm you selected.
I'd really like to be able to have HDF5 do the compression because I have
grown quite accustomed to how transparent it is. The filters are just
activated, and regardless of how you compress the data, you 'see' floating
point data when you open the file or run h5dump or whatever.
I could code things up to have each core open each hdf5 file, write each
part of its file, close it, and hand it off to the next guy, but I just have
to believe that's going to be really inefficient. It seems there should be a
way to do this by passing file handles or property lists from one MPI
process to another.
I did find this page called "Collective HDF5 Calls in Parallel" which is
interesting but it is unclear to me whether it applies to pHDF5 or just
plain HDF5.
Leigh
--
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research
in Boulder, CO
NCAR office phone: (303) 497-8200