How to add field column name to dataset in C#


#1

Hello,

How do I add column name to a dataset in C#? I have been searching everywhere and couldn’t figure out a way to add names to my columns. Please, can someone help me. Thank you in advance.

image


#2

Hi locquan,

The dataset needs to be a compound for its columns (i.e. members) to have names associated to these. Not sure how this is done with other libraries but with HDFql the creation of, e.g., a compound dataset named dset of one dimension (size 100) composed of two columns (named first_column and second_column) both of data type float can be done as follows in C#:

HDFql.Execute("CREATE DATASET dset AS COMPOUND(first_column AS FLOAT, second_column AS FLOAT)(100)");

In case of need, to get the name of the columns (i.e. members) of dataset dset just do the following:

HDFql.Execute("SHOW MEMBER dset");

Alternatively, the dataset could be just a regular 2D dataset (like the one illustrated in your screenshot) with one attribute containing the name of the columns. Example:

// create a dataset named 'dset' of two dimensions (size 100x2) of data type float
HDFql.Execute("CREATE DATASET dset AS FLOAT(100, 2)");

// create an attribute named 'names' (associated to dataset 'dset') of one dimension (size 2) of data type varchar with initial values 'first_column' and 'second_column'
HDFql.Execute("CREATE ATTRIBUTE dset/names AS VARCHAR(2) VALUES(first_column, second_column)");

As a side note, besides C#, the HDFql snippets above can be executed in C, C++, Java, Python, Fortran and R without any modifications - ideal if you/your organization work with several of these programming languages at the time.

Hope this helps!


#3

Hello HDFql,

Thank you for your help to get me started with HDFql.

I am now stuck on how to populate 2 arrays into those 2 columns of the compound 1 dimension dataset. I have valuesX and valuesY, and want to populate them into X_column and Y_column respectively.

        int[] valuesX = new int[100];
        int[] valuesY = new int[100];

        for (int i = 0; i < 100; i++)
            valuesX[i] = i;

        for (int j = 0; j < 100; j++)
            valuesY[j] = 100 - j;

        HDFql.Execute("CREATE FILE painters.h5");
        HDFql.Execute("USE FILE painters.h5");
        HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

        HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(X_column AS FLOAT, Y_column AS FLOAT)(100)");

??

        HDFql.Execute("CREATE ATTRIBUTE picasso/guernica/subject AS UTF8 VARCHAR VALUES(\"guerra civil española\")");
        HDFql.Execute("CLOSE FILE");

Thank you in advance for your help.


#4

Hi locquan,

First, the code you have posted contains a potential issue: variables valuesX and valuesY are both declared as int, while the members of the compound are declared as float. I will assume that members are of data type int hereafter - feel free to change it to float if that is the case after all.

There is more than one way to implement your use-case. For instance, since both members are of the same data type, you could: 1) create a 2D array of this type, 2) populate the array with values, 3) register the array, and 4) write the array into the compound guernica. In other words:

int [,]values = new int[100, 2];

for(int i = 0; i < 100; i++)
{
    values[i, 0] = i;
    values[i, 1] = 100 - i;
}

HDFql.Execute("CREATE FILE painters.h5");

HDFql.Execute("USE FILE painters.h5");

HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(X_column AS INT, Y_column AS INT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

Another way (maybe more appropriate) would be to have a struct (instead of a 2D array) and use it to populate the compound guernica. Example:

using System.Runtime.InteropServices;

[StructLayout(LayoutKind.Sequential, Pack = 0)]
struct Data
{
    public int X;
    public int Y;
}

Data []values = new Data[100];

for(int i = 0; i < 100; i++)
{
    values[i].X = i;
    values[i].Y = 100 - i;
}

HDFql.Execute("CREATE FILE painters.h5");

HDFql.Execute("USE FILE painters.h5");

HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(X_column AS INT, Y_column AS INT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

As you can see from the code above, Data is a sequencial struct with zero packing (i.e. padding) so that it has a correct matching with the members of the compound (which is created without padding by default). If you do not wish to pack the struct, in the HDFql statement that creates the compound, you will have to either:

  1. specify the size (of the compound) and the offsets of each member of the compound, or:

  2. specify the size (of the memory) and the offsets of each member of the memory

Good luck!


#5

Thank you so much for your help. Now, I am trying to make a mixed data type dataset. The HDFql.VariableTransientRegister(values) does not seems to work.

[StructLayout(LayoutKind.Sequential, Pack = 0)]
struct Data
{
public string Name;
public int Age;
public double Weight;
}

private void button2_Click(object sender, EventArgs e)
{
Data[] values = new Data[100];

for (int i = 0; i < 100; i++)
{
	values[i].Name = i.ToString();
	values[i].Age = i;
	values[i].Weight = Convert.ToDouble(i);
}

HDFql.Execute("CREATE FILE painters.h5");

HDFql.Execute("USE FILE painters.h5");

HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(name AS VARCHAR, age AS UNSIGNED INT, weight AS FLOAT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

}

Thank you for your help in advance.

Warm Regards,


#6

Currently, compounds with variable-length data types (e.g. string) members are not supported in C#. This limitation was announced in the release notes of HDFql version 2.2.0.

Until we enable HDFql to support variable-length data types in C#, a workaround could be to: 1) declare member name (in struct Data) as an array of fixed size (e.g. 20) of data type byte and 2) change the statement that creates the dataset as follows:

HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(name AS CHAR(20), age AS UNSIGNED INT, weight AS FLOAT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

If that is suitable for you, another workaround could be to use HDFql in a programming language that fully supports variable-length data types such as C or C++.


#7

Hello,

I do have a fixed string size of 3 but still getting the same error in HDFql.

[StructLayout(LayoutKind.Sequential, Pack = 0)]
struct Data
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public byte[] Name;
public int Age;
public double Weight;
}

private void button2_Click(object sender, EventArgs e)
{
Data[] values = new Data[100];

string author = "Joe";

for (int i = 0; i < 100; i++)
{
	values[i].Name = Encoding.ASCII.GetBytes(author);
	values[i].Age = i;
	values[i].Weight = Convert.ToDouble(i);
}

HDFql.Execute("CREATE FILE painters.h5");

HDFql.Execute("USE FILE painters.h5");

HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(name AS CHAR(3), age AS UNSIGNED INT, weight AS FLOAT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

}

Thank you for your help in advance.

Warm Regards,


#8

Thanks for the feedback!

Try the following:

[StructLayout(LayoutKind.Sequential, Pack = 0)]
unsafe struct Data
{
    public fixed byte Name[3];
    public int Age;
    public double Weight;
}


public unsafe static void Main(string []args)
{

    Data []values = new Data[100];
	
    for(int i = 0; i < 100; i++)
    {			
        fixed (Data* p = &(values[i]))  
        {  
            p->Name[0] = (byte) 'J';  
            p->Name[1] = (byte) 'o';  
            p->Name[2] = (byte) 'e';  
        }
        values[i].Age = i;
        values[i].Weight = i;
    }

    HDFql.Execute("CREATE FILE painters.h5");

    HDFql.Execute("USE FILE painters.h5");

    HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

    HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(name AS CHAR(3), age AS UNSIGNED INT, weight AS DOUBLE)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values) + " OFFSET(0, 4, 8)");
}

Please notice that members’ offsets (i.e. 0, 4 and 8) are hard-coded in the above code snippet. To have the code more portable/robust, you should use an appropriate C# method that calculates and returns the members’ offsets (within struct Data) instead. Also, you need to compile the code with the /unsafe parameter.

FYI, we have improved the way the HDFql C# wrapper handles variables that cannot be pinned (by C# GC) when registering these (which was the case you were facing). Now, instead of finishing abruptly due to an exception, the wrapper handles graciously this case by returning an error (HDFql.ErrorUnexpectedDataType). This improvement should be available in the next official release of HDFql.


#9

Hello HDFql,

Thank you for your help previously.

Now, I encounter a new issue with creating dataset in new file. Using the same sample code you have been using in this thread. I put a loop, say 15 times, around your code. This should create 15 files with the same contents over and over again, with a new file name for each.

For some reasons, after the 8th file, it consistently stops creating the dataset. I tried to run the same code on another computer, it consistently stops creating the dataset after the 1st file. Is HDFql running out of memory somewhere?

    private void button1_Click(object sender, EventArgs e)
    {
        for (int iIndex = 0; iIndex < 15; iIndex++)
        {
            int[,] values = new int[100, 2];

            for (int i = 0; i < 100; i++)
            {
                values[i, 0] = i;
                values[i, 1] = 100 - i;
            }

            string sFileName = "painters" + iIndex.ToString() + ".h5";
            HDFql.Execute("CREATE FILE " + sFileName);
            HDFql.Execute("USE FILE " + sFileName);
            HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

            HDFql.Execute("CREATE ATTRIBUTE picasso/subject AS UTF8 VARCHAR VALUES(\"Les Demoiselles d'Avignon\")");

            HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(X_column AS INT, Y_column AS INT)(100) VALUES FROM MEMORY " + HDFql.VariableTransientRegister(values));

            HDFql.Execute("CREATE ATTRIBUTE picasso/guernica/subject AS UTF8 VARCHAR VALUES(\"guerra civil española\")");
            HDFql.Execute("CLOSE FILE");
        }
    }

image

image

Thank you in advance for your help.


#10

Hi @locquan,

It seems that you are working with old HDF5 files. Could you please delete all HDF5 files and run the code you have posted again? This should solve the issue you are facing.

Alternatively, you could truncate (i.e. erase) the HDF5 file when creating it like the following (to make sure you are not working with an old HDF5 file):

HDFql.Execute("CREATE TRUNCATE FILE " + sFileName);

Hope this helps!


#11

Hi HDFql,

I tried both suggestions. It still doesn’t work.

To make sure there is no initial HDF5 related software installed, including HDFView, I ran on a fresh PC.

Below is my entire directory that I took to a fresh PC. After the 8th file, the dataset is no longer get created. All attributes get created ok.

Thank you in advanced for your help.


#12

It seems that there is an issue with the VariableTransientRegister method. As a workaround, please use the VariableRegister method instead like the following:

public class HDFqlExample
{
    public static void Main(string []args)
    {

        int[,] values = new int[100, 2];
        int number;

        number = HDFql.VariableRegister(values);

        for (int i = 0; i < 100; i++)
        {
            values[i, 0] = i;
            values[i, 1] = 100 - i;
        }

        for (int iIndex = 0; iIndex < 15; iIndex++)
        {
            string sFileName = "painters" + iIndex.ToString() + ".h5";

            HDFql.Execute("CREATE FILE " + sFileName);

            HDFql.Execute("USE FILE " + sFileName);

            HDFql.Execute("CREATE GROUP picasso ORDER TRACKED");

            HDFql.Execute("CREATE ATTRIBUTE picasso/subject AS UTF8 VARCHAR VALUES(\"Les Demoiselles d'Avignon\")");

            HDFql.Execute("CREATE DATASET picasso/guernica AS COMPOUND(X_column AS INT, Y_column AS INT)(100) VALUES FROM MEMORY " + number);

            HDFql.Execute("CREATE ATTRIBUTE picasso/guernica/subject AS UTF8 VARCHAR VALUES(\"guerra civil española\")");

            HDFql.Execute("CLOSE FILE");
        }

        HDFql.VariableUnregister(values);

    }
}

FYI, we will release a new version of HDFql containing a fix for this issue as soon as the HDF5 library version 1.8.22 is released - which should happen still this year.