conversion HDF5 to CSV


#1

HI

I have nested HDF5 fie (groups inside groups and then datasets).
here my dataset contain 1D (have only one column with multiple rows) , 2D (having multiple columns and multiple rows) and 3D (multiple number of 2D Dataset)

I want to convert this dataset into CSV file.

Generated CSV file can have all the dataset inside it or separate csv file can be generated for each dataset.

How to do this.

regards
sumit


#2

… I want to convert this [3D] dataset into CSV file(s)

Just curious: Why? As you mention in your own question, 3D datasets don’t map well to 2D spreadsheets. We learned this ourselves when we created the ODBC driver for HDF5, e.g. how to map cyclic graph structures to a 2D table? It’s an imperfect fit at best.

To clarify: I’m not suggesting that what you’re trying to do is a bad idea, just curious as to the use-case, since there may be an easier approach, e.g. ways to access HDF5 files from directly inside Excel vs. converting to CSV.

– dave

P.S. Here’s an example of a specific dataset being converted to HDF5: https://github.com/amgreenstreet/Million-Song-Dataset-HDF5-to-CSV


#3

Thanks David,

My target is to load the HDF5 data into hive tables and dump it into HDFS,
Furthermore, I want to run hive queries on the same data.

Is there any provision to directly feed data into hive tables, instead of converting HDF5 data into csv files and then feeding to hive?

Regards
sumit


#4

With the new HDFS Connector – part of HDF5 Enterprise Support (https://www.hdfgroup.org/solutions/enterprise-support/) – you can just dump your HDF5 files onto Hadoop and then run a standard MapReduce job. No need to convert or extract data into another format. Spark is another good option for running analytics on HDF5 files in place without conversion.

– Dave