Connectivity with HDF5 data and meta-data tagging


#1

Hi
I have been working with HDF5 since couple of weeks, It seems very good and effective but I am facing with some following issues related to connectivity.and relates stuff.

  • How to directly access the data from the HDF5 dataset
    :- if I want to access data from a particular data-set of a HDF5 file directly ,so how to do this
    and If I made any changes with the data will it be permanent for the original data-set

  • How to copy particular data from data-set
    :- If my data consist of 2-d/3-d structure how to retrieve or copy particular column of data
    from it

.

  • Data Tagging
    :-If I have metadata in my data-set of HDF5 file and I have to Tag the data with some
    attributes, so is it possible after dumping data I can tag particular column of data or i have to
    do this at time of data dumping.

  • Data Search
    :- I i want to find out particular data from tagged data-set in HDF5 file how to find it out with
    optimum amount of time and computational power

  • Diff between HDF5 and HDFS
    :- what is the actual difference between HDF5 and HDFS file system. and can we use Hadoop
    tools on HDF5 files


#2

Data Search: I i want to find out particular data from tagged data-set in HDF5 file
how to find it out with optimum amount of time and computational power

You have a LOT of options, including:

  • Python tools, e.g. PyTables, pandas
  • HDF NoDB

Diff between HDF5 and HDFS what is the actual difference between HDF5
and HDFS file system

I think Gerd’s article still provides the best description of both: https://support.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf

and can we use Hadoop tools on HDF5 files

Again, many options including:

  • Spark Connector (part of Enterprise Support Edition)
  • HDFS Virtual File Driver (also part of Enterprise Support Edition)