I am working on a early stage research project. Currently, I need to create two datasets each about 2.5 TB. where I can accumulate results over multiple runs. The slowest part of my application is allocating the datasets on the disk. I had a few questions about this process.
- Is the allocation process/filling performed in parallel for datasets?
- Are there any tricks that I can use to speed up this process?
I am on a LUSTRE system and it is just the allocation part that is slow. Reading and writing this entire dataset takes less than 1 hour after the allocation. However, the allocation is currently taking 3 hours.