What is the key features for hdf5 comparing with other big data plateform

Dear,

I used HDF5 for a project. And I also read some documents about other big
data solutions like Hadoop, and also I know there are some articles compare
both of them.
But I am still not too clear that what the issue is best for HDF5 but not
for hadoop. And why there are only few commercial application based on HDF5
but hadoop has a lot.

Kind regards,
Long

Hi, Long!

  Would you please list the articles that compare both of them? I'm
interested in reading them, too. If Big HDF FAQs [1] is not in your
list, you may want to read it first. As the FAQs document says,
comparing them is not straightforward so you're not alone if you're
not clear about the issue.

  In my opinion, I think HDF5 is for science and Hadoop is for
analytics. The presentation [2] summarizes this well: big data as
output (HDF5) and big data as input (Hadoop).

  Not many organizations do big science other than national labs with
super computers.
Commercial companies want to do big data analytics and pay for
Hadoop solutions without owning super computers. Here, cloud computing
is ideal for companies.

  That explains a little bit why only few commercial applications
(MATLAB, IDL, Mathematica, etc.) support HDF5. Scientists can program
what they want for their simulation. Fortan and Python may suffice for
them.

  However, I think the main reason is that the marketing effort of The
HDF Group, as a non-profit, has not been strong and the learning curve
of HDF5 for average users is quite steep (no easy-to-use killer
application like Excel, difficult to install, complex APIs and file
format specification, etc.)

[1] http://www.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf
[2] http://www.csm.ornl.gov/workshops/SOS17/documents/Plimpton_sos_Mar13.pdf

···

--
Save the Earth. Save Earth data in HDF-EOS. Save Big data in HDF.

On Thu, Nov 13, 2014 at 8:47 AM, long zhao <comlong@gmail.com> wrote:

Dear,

I used HDF5 for a project. And I also read some documents about other big
data solutions like Hadoop, and also I know there are some articles compare
both of them.
But I am still not too clear that what the issue is best for HDF5 but not
for hadoop. And why there are only few commercial application based on HDF5
but hadoop has a lot.

Kind regards,
Long

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Thanks a lot for your reply. The article I read before just the "Big HDF
FAQs.pdf". I used HDF5+ irods(a distributed file system), so if the purpose
is just to store super big data, both of the approach(Hadoop and hdf5)
could be possible. And maybe hdf5 has better performance if the user need
batch of data to analyze.
But people always like to choose hadoop but not hdf5. It is so confused.
If the different is "*big data as **output (HDF5) and big data as input
(Hadoop)*", then HDF5 would be much popular for BI. But it seems no product
about it.

Long

···

On Thu, Nov 13, 2014 at 6:42 PM, H. Joe Lee <hyoklee@hdfgroup.org> wrote:

Hi, Long!

  Would you please list the articles that compare both of them? I'm
interested in reading them, too. If Big HDF FAQs [1] is not in your
list, you may want to read it first. As the FAQs document says,
comparing them is not straightforward so you're not alone if you're
not clear about the issue.

  In my opinion, I think HDF5 is for science and Hadoop is for
analytics. The presentation [2] summarizes this well: big data as
output (HDF5) and big data as input (Hadoop).

  Not many organizations do big science other than national labs with
super computers.
Commercial companies want to do big data analytics and pay for
Hadoop solutions without owning super computers. Here, cloud computing
is ideal for companies.

  That explains a little bit why only few commercial applications
(MATLAB, IDL, Mathematica, etc.) support HDF5. Scientists can program
what they want for their simulation. Fortan and Python may suffice for
them.

  However, I think the main reason is that the marketing effort of The
HDF Group, as a non-profit, has not been strong and the learning curve
of HDF5 for average users is quite steep (no easy-to-use killer
application like Excel, difficult to install, complex APIs and file
format specification, etc.)

[1] http://www.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf
[2]
http://www.csm.ornl.gov/workshops/SOS17/documents/Plimpton_sos_Mar13.pdf

--
Save the Earth. Save Earth data in HDF-EOS. Save Big data in HDF.

On Thu, Nov 13, 2014 at 8:47 AM, long zhao <comlong@gmail.com> wrote:
> Dear,
>
> I used HDF5 for a project. And I also read some documents about other big
> data solutions like Hadoop, and also I know there are some articles
compare
> both of them.
> But I am still not too clear that what the issue is best for HDF5 but not
> for hadoop. And why there are only few commercial application based on
HDF5
> but hadoop has a lot.
>
> Kind regards,
> Long
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi, Long!

Would you please list the articles that compare both of them? I'm
interested in reading them, too. If Big HDF FAQs [1] is not in your
list, you may want to read it first. As the FAQs document says,
comparing them is not straightforward so you're not alone if you're
not clear about the issue.

Thanks for this very informative email!

In my opinion, I think HDF5 is for science and Hadoop is for
analytics. The presentation [2] summarizes this well: big data as
output (HDF5) and big data as input (Hadoop).

Not many organizations do big science other than national labs with
super computers.
Commercial companies want to do big data analytics and pay for
Hadoop solutions without owning super computers. Here, cloud computing
is ideal for companies.

That explains a little bit why only few commercial applications
(MATLAB, IDL, Mathematica, etc.) support HDF5. Scientists can program
what they want for their simulation. Fortan and Python may suffice for
them.

However, I think the main reason is that the marketing effort of The
HDF Group, as a non-profit, has not been strong and the learning curve
of HDF5 for average users is quite steep (no easy-to-use killer
application like Excel, difficult to install, complex APIs and file
format specification, etc.)

That is true indeed! I have a (very) hard time convincing project partners
to try out HDF5: there is lots of "user" and "reference" manual material
available, but what is missing is a set of tutorials that show how to solve
particular use cases with HDF5. For example, the ones I would welcome most
in the context of the ("robotics") projects I am involved in are:
- "streaming" HDF5 data over sockets (the streaming interface was abandoned
   some years ago,...:frowning: )
- REST interactions ("queries") with HDF5, including from C/JavaScript or HTML5 "web apps";
- how to integrate HDF5 with semantic domain models on top (like XDMF), and
   reference implementations to support this;
- examples of "in process shared memory" access to HDF5 data structures in
   RAM (there is the MPI driver but its use case is quite the opposite of
   what I need most, that is deterministic realtime efficiency).

I am quite sure that all these things can work (and we have done some proof
of concepts ourselves the last couple of years) but there is no "out of the
box" information available for these use cases from the HDFgroup website.

Are there any "hackatons" or similar events out there where people from my
group could work together with other groups in some of the above-mentioned
issues, with the result being hosted/supported/linked to from the official
HDFgroup website?

[1] http://www.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf
[2] http://www.csm.ornl.gov/workshops/SOS17/documents/Plimpton_sos_Mar13.pdf

Best regards,

Herman Bruyninckx

···

On Thu, 13 Nov 2014, H. Joe Lee wrote:

--
Save the Earth. Save Earth data in HDF-EOS. Save Big data in HDF.

On Thu, Nov 13, 2014 at 8:47 AM, long zhao <comlong@gmail.com> wrote:

Dear,

I used HDF5 for a project. And I also read some documents about other big
data solutions like Hadoop, and also I know there are some articles compare
both of them.
But I am still not too clear that what the issue is best for HDF5 but not
for hadoop. And why there are only few commercial application based on HDF5
but hadoop has a lot.

Kind regards,
Long