Hi, Long!
Would you please list the articles that compare both of them? I'm
interested in reading them, too. If Big HDF FAQs [1] is not in your
list, you may want to read it first. As the FAQs document says,
comparing them is not straightforward so you're not alone if you're
not clear about the issue.
Thanks for this very informative email!
In my opinion, I think HDF5 is for science and Hadoop is for
analytics. The presentation [2] summarizes this well: big data as
output (HDF5) and big data as input (Hadoop).
Not many organizations do big science other than national labs with
super computers.
Commercial companies want to do big data analytics and pay for
Hadoop solutions without owning super computers. Here, cloud computing
is ideal for companies.
That explains a little bit why only few commercial applications
(MATLAB, IDL, Mathematica, etc.) support HDF5. Scientists can program
what they want for their simulation. Fortan and Python may suffice for
them.
However, I think the main reason is that the marketing effort of The
HDF Group, as a non-profit, has not been strong and the learning curve
of HDF5 for average users is quite steep (no easy-to-use killer
application like Excel, difficult to install, complex APIs and file
format specification, etc.)
That is true indeed! I have a (very) hard time convincing project partners
to try out HDF5: there is lots of "user" and "reference" manual material
available, but what is missing is a set of tutorials that show how to solve
particular use cases with HDF5. For example, the ones I would welcome most
in the context of the ("robotics") projects I am involved in are:
- "streaming" HDF5 data over sockets (the streaming interface was abandoned
some years ago,... )
- REST interactions ("queries") with HDF5, including from C/JavaScript or HTML5 "web apps";
- how to integrate HDF5 with semantic domain models on top (like XDMF), and
reference implementations to support this;
- examples of "in process shared memory" access to HDF5 data structures in
RAM (there is the MPI driver but its use case is quite the opposite of
what I need most, that is deterministic realtime efficiency).
I am quite sure that all these things can work (and we have done some proof
of concepts ourselves the last couple of years) but there is no "out of the
box" information available for these use cases from the HDFgroup website.
Are there any "hackatons" or similar events out there where people from my
group could work together with other groups in some of the above-mentioned
issues, with the result being hosted/supported/linked to from the official
HDFgroup website?
[1] http://www.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf
[2] http://www.csm.ornl.gov/workshops/SOS17/documents/Plimpton_sos_Mar13.pdf
Best regards,
Herman Bruyninckx
···
On Thu, 13 Nov 2014, H. Joe Lee wrote:
--
Save the Earth. Save Earth data in HDF-EOS. Save Big data in HDF.
On Thu, Nov 13, 2014 at 8:47 AM, long zhao <comlong@gmail.com> wrote:
Dear,
I used HDF5 for a project. And I also read some documents about other big
data solutions like Hadoop, and also I know there are some articles compare
both of them.
But I am still not too clear that what the issue is best for HDF5 but not
for hadoop. And why there are only few commercial application based on HDF5
but hadoop has a lot.
Kind regards,
Long