Java int limit question


#1

I have code that works really well to write a dataset from a java array, the problem is I reach the size limit on an int after about 20 million records and get an error. I would like to keep this code and somehow get around this. My dest_data is a byte buffer which has to be initialized with an int, the size of one record is 105 bytes so over 20 million records I will hit an int size limit, a byte array has to be initialized with an int.

	dset_data = new byte[(int) dims[0] * Demand_Datatype.getDataSize()];
	
	ByteBuffer outBuf = ByteBuffer.wrap(dset_data);
	outBuf.order(ByteOrder.nativeOrder());

	for (int indx = 0; indx < (int) dims[0]; indx++) {
		object_data[indx].writeBuffer(outBuf, indx * Demand_Datatype.getDataSize());
	}
    
    // Write the Data Set
	try {
		if ((dataset_id >= 0) && (memtype_id >= 0))
			H5.H5Dwrite(dataset_id, memtype_id, HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
					HDF5Constants.H5P_DEFAULT, dset_data);
	} catch (Exception e) {
		e.printStackTrace();
	}

#2

Richard,
Unfortunately, Java arrays are limited to int max size index. That limitation is a problem when the underlying C library can support a 64bit address space.

The only solution I can suggest for HDF5 is to use hyperslabs. This is what we do in HDFView.

Allen


#3

I have gone through several examples trying to do this but most do not take into effect a complex data set, they are usually for something like a 2 dimensional simple set of integers, I have 7 different types but only 1 level. That said I cannot get the examples to work, I have a file that needs to get to 100 million records but I truncate due to the int size of my buffer at about 20 million rows. Our vendor is forcing us to provide data in hdf5 format so the data missing is a huge problem. I get sent and pointed to the example pages but the examples are too simplistic.


#4

Yes, that is the trouble with examples!

Is the object library a hard requirement, because it was designed to abstract away the differences between hdf4 and hdf5 and there were compromises made. I would suggest using the java library from the 1.10.2 hdf5 library. It has almost full support for the hdf5 API, and any C example/code will be almost one-to-one applicable.

Allen


#5

I started working through how to create the dataset as chunked with unlimited dimensions so I could perhaps reopen the file and add to it, if I cannot write the full size all at once. Since this is a complex type I use a memtype, when I get to the dataset write it has a conversion error, thinking I need a xfer_plist but all the examples I see just set that to default. I don’t know if it is having trouble with one value in the list or using a composite memtype at all

	int xfer_plist_id = HDF5Constants.H5P_DATASET_XFER_DEFAULT;
	try {
		if ((dataset_id >= 0) && (memtype_id >= 0))
			//H5.H5Dwrite(dataset_id, memtype_id, HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
			//		HDF5Constants.H5P_DEFAULT, dset_data);
        	H5.H5Dwrite(dataset_id, memtype_id, HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
        			HDF5Constants.H5P_DEFAULT, dset_data);
			//H5.H5Dwrite(dataset_id, memtype_id, mem_space_id, filespace_id, xfer_plist_id, dset_data);
			H5.H5Dwrite(dataset_id, memtype_id, dataspace_id, file_id, xfer_plist_id, dset_data);
			System.out.println("Data Set Written");
	} catch (Exception e) {
		System.out.println("Exception writing data set " + e.getMessage());
		//e.printStackTrace();
	} 

Get this error

Exception writing data set Datatype:Unable to initialize object ["…\src\H5T.c line 4548 in H5T_path_find(): no appropriate function for conversion path

Here is my data types

						static int[] memberMemTypes = { HDF5Constants.H5T_NATIVE_UINT64, 
							    HDF5Constants.H5T_C_S1, 
							    HDF5Constants.H5T_C_S1,
							    HDF5Constants.H5T_NATIVE_UINT8, 
							    HDF5Constants.H5T_C_S1, 
							    HDF5Constants.H5T_C_S1, 
							    HDF5Constants.H5T_NATIVE_UINT8,
							    HDF5Constants.H5T_C_S1, 
							    HDF5Constants.H5T_C_S1, 
							    HDF5Constants.H5T_NATIVE_UINT8, 
							    HDF5Constants.H5T_NATIVE_UINT8,
							    HDF5Constants.H5T_NATIVE_UINT8, 
							    HDF5Constants.H5T_NATIVE_UINT16, 
							    HDF5Constants.H5T_NATIVE_UINT16, 
							    HDF5Constants.H5T_NATIVE_UINT32,
							    HDF5Constants.H5T_NATIVE_UINT8, 
							    HDF5Constants.H5T_NATIVE_UINT8 };

#6

Hi Richard,

It looks like you may wish to write subsets to a dataset that has a compound datatype. Is that right?

I don’t have a Java example that shows this, but I do have a C example that I attached:

cmploop.c (3.9 KB)

It appends to a dataset with a compound dataype in a loop.
Subsetting is done with the dataspace interface using H5Sselect_hyperslab.

There are Java examples on this page:
https://portal.hdfgroup.org/display/HDF5/Examples+by+API

Under “Datasets” there is a Java example that shows how to Read / Write a Chunked Dataset and one to Read/ Write by Hyperslabs.

-Barbara