Multithreaded writing of multiple files in Java

In https://forum.image.sc/t/hdf5-java-parallel/25560/1 a user was looking to write to multiple different HDF5 files in parallel. In looking into the issue, I came across https://stackoverflow.com/questions/33040652/how-to-create-multiple-instances-of-the-same-library-with-jna and was wondering if you would consider this the best workaround for the synchronized methods (without using multiprocessing).

~Josh

Perhaps a better starting point is: is there any known way to write to multiple files simultaneously from a single Java HDF process?

Hi Josh, I can describe what we did (Bitplane Imaris developer here) to achieve parallel thread-safe reading in C/C++ before SWMR was added to hdf5, hoping that it can help.

The principle is to dynamically load multiple instances of hdf5.dll. In C/C++, the same binary cannot be loaded multiple times (all loaded instances share the same memory space, I assume the same happens in java), but the dynamic linking allows simple copy-pasted copies of the original dll to be loaded independently.

A pseudo-code for parallel writing would look like:

// returns a structure containing hdf5 function pointers
function load_hdf5_lib( dll_name )
    dll = LoadLibrary( dll_name ) // system dependent call
    lib.H5Fopen = GetProcAddress( dll , "H5Fopen" ) // again system dependent
    lib.H5Dopen = GetProcAddress( dll , "H5Dopen1" ) // careful with macro definitions
    lib.H5Dwrite = GetProcAddress( dll , "H5Dwrite" )
    // assign here all other required functions. yes, this is tedious
    return lib

// writes data_array into file_name using the functions in dll_name
function write_hdf5( dll_name , file_name , data_array )
    lib = load_hdf5_lib( dll_name )
    file = lib.H5Fopen( file_name , RW )
    data_set = lib.H5Dopen( file , "data_set" )
    lib.H5Dwrite( data_set, data_array )

// writes data_array into 3 files, one thread per file
function write_hdf5_in_parallel( data_array , file_name_0 , file_name_1 , file_name_2 )
    run_on_thread( write_hdf5( "hdf5_copy_0.dll" , file_name_0 , data_array) )
    run_on_thread( write_hdf5( "hdf5_copy_1.dll" , file_name_1 , data_array) )
    run_on_thread( write_hdf5( "hdf5_copy_2.dll" , file_name_2 , data_array) )
1 Like

Hi Igor,

thanks for the details, @igor . Certainly reproducing what Bitplane did in C++ is what we’d like to see in Java. If there are no other suggestions, I’ll give your suggestion a try.

Cheers,
~Josh

1 Like

The slow part of reading and writing is usually the compression. You can use H5Dread_chunk and H5Dwrite_chunk (since hdf5 1.10) to access the compressed streams without processing and decompress/decompress them using the gzip library separately, which can be done in parallel. There is some little overhead in hdf5, but most part of H5Dread/write_chunk is direct disk access. You can implement a pipeline of reading-decompressing and/or compressing-writing. If the bottleneck is the multi-threaded compression or decompression (CPU-bound), or the single-threaded reading or writing is already saturating the disk capacity (I/O-bound), you would not gain anything more by parallelizing the file accesses. This solution may be simpler and more elegant than dynamically loading multiple copies of the library, and it would provide a boost for single (or few), large files as well.

2 Likes

Multithreaded Java file writing has various uses. Developers may speed up their apps by writing to numerous files simultaneously utilising multiple threads.

Java Multithreaded writing https://cloudfoundation.com/blog/what-is-java/ involves several threads writing to various files concurrently. Each thread writes data to a separate file. This lets a software write to many files simultaneously. This boosts application speed and efficiency.

Java multithreaded writing of several files is often used with databases. Multiple threads can update database tables efficiently. This improves application performance and reliability.

Web applications use multithreaded Java file writing. Each thread writing to a distinct page efficiently updates pages. This improves user experience and application reliability.

Thread synchronisation is crucial for multithreaded Java file writing. Each thread should be able to access the same data without interference. This synchronisation ensures data consistency.

Writing data requires correct buffering and synchronisation. In every application, data corruption can result from improper buffering. Buffering helps optimise data writing.

Finally, Java multithreaded file writing speed and scalability must be considered. Different methods can boost performance efficiently. Scalability can also be improved to make the software more accessible to additional users.