Hi,
I have a python script that essentially converts a csv file to python, but I have a few problems that I haven't been able to solve.
1. Column Order - I don't know the columns that I need to write until runtime, so creating an extension of the IsDescription class was a non-starter. Therefore, to define by columns, I am passing in a dictionary that maps column name to column class to the create_table method:
h5_file = tables.open_file(filename, mode = 'w', title = 'Test File')
group = h5_file.create_group('/', 'data', 'Data Group')
column_dict = OrderedDict()
for key in column_names:
column_dict[key] = create_col(key)
table = h5_file.create_table(group, 'table', column_dict, 'Table')
create_col is simply a method that returns Int32Col(), Float64Col(), etc., depending on some information about the column. That is working fine. However, the columns in the table that are created are not in the order that I want. I used OrderedDict to ensure that the columns are in the dictionary in insertion order, but the table doesn't reflect this. Any ideas on how to control the column order if I can't extend IsDescription to create my data type?
2. Variable length strings - Strings work fine when I give them a maximum size. This was fine to get something up and running, but the strings really need to be variable length. Is there a way to have VLString columns within a table? I see examples of VLStringAtom being passed as a type to h5file.create_array, but I don't see similar examples for table columns and there isn't a Col class for this type. Any help is appreciated.
3. "Blanks" in my csv file - The csv files I'm converting contain null or blank values. If you imagine loading the file in Excel or a similar program, some cells will be blank. So, even if column X is an Int32Col, there may be blanks. How would I handle this using PyTables? I suppose I can substitute some value for blank cells, but I would like to avoid that if possible.
Help on any of these items is greatly appreciated. I know that using h5py (would I have to use the low-level API?) instead of pytables would probably solve these problems, but am trying to avoid that since pytables has otherwise been so easy to use.
Thanks in advance,
Sarah