HDF FUSE

I am in the process of developing some generic HDF tools to make HDF files more accessible to regular applications.

In developing datasets for scientific visualization and animation I frequently must load 2D and 3D datasets into third party software packages, some of which are open source and some are not. The output of these datasets are 3D graphics and other metadata.

It would be desirable to amalgamate regular files into HDF file datasets and be able to read those datasets as a simple files, so that programs that do not have HDF code would be able to read the files without modify their application code. Applications that are HDF savy would be able to do more
sophisticated operations on the datasets.

Last week I developed a prototype using macfuse (which is based on the soureforge FUSE), that allows me to mount an HDF file and make it look like a regular file system. In the test I was able to generate a FBX graphics files (part of another prototype) readable by the Maya animation package,
insert them into an HDF file, mount the HDF file as a file system and have quicktime read the FBX datasets contained in the HDF file.

I am planning to develop two tools: hdfFUSEin and hdfFUSE.

hdfFUSEin would ingest a file and create a dataset under the group "/FUSE/" . It would be a command line program with arguments specifying the HDF filename, ingest filename, and optional alternate dataset name (default would be the ingest filename). If the dataset name is all ready in use, an
error would be returned and the ingest file would not be amalgamated.

hdfFUSE would expose all /FUSE/ group datasets as files, mounted under a filesystem having the same name as the HDF file.
Initially it would be a read-only filesystem and would be a command line program. Shortly after, I plan to make it work under the OS gui so that when you click the HDF file it mounts its /FUSE/ datasets. Later on I would eliminate hdfFUSEin and make hdfFUSE a writable file system.

I do not think at this point it would advisable to FUSE mount every dataset in an HDF file as a regular file, but only specific datasets under /FUSE/ group such as 1D character contiguous datasets.

Before I code this up I wanted to put out this RFC for design ideas.

thanks, matt

Matthew Dougherty
713-433-3849
National Center for Macromolecular Imaging
Baylor College of Medicine/Houston Texas USA

···

=========================================================================

Hi Matthew,

I am in the process of developing some generic HDF tools to make HDF files more accessible to regular applications.

In developing datasets for scientific visualization and animation I frequently must load 2D and 3D datasets into third party software packages, some of which are open source and some are not. The output of these datasets are 3D graphics and other metadata.

It would be desirable to amalgamate regular files into HDF file datasets and be able to read those datasets as a simple files, so that programs that do not have HDF code would be able to read the files without modify their application code. Applications that are HDF savy would be able to do more sophisticated operations on the datasets.

  Sounds very interesting and something we've tossed around a few times, but hadn't any funding to pursue. Traversing groups as directories and reading data from 1-D datasets is pretty obvious, but how do you handling creating datasets or reading datasets with >1 dimension? Is there some way to use ioctl() calls to pass along the extra information needed?

  Quincey

···

On Dec 17, 2007, at 4:49 PM, Dougherty, Matthew T. wrote:

Last week I developed a prototype using macfuse (which is based on the soureforge FUSE), that allows me to mount an HDF file and make it look like a regular file system. In the test I was able to generate a FBX graphics files (part of another prototype) readable by the Maya animation package, insert them into an HDF file, mount the HDF file as a file system and have quicktime read the FBX datasets contained in the HDF file.

I am planning to develop two tools: hdfFUSEin and hdfFUSE.

hdfFUSEin would ingest a file and create a dataset under the group "/FUSE/" . It would be a command line program with arguments specifying the HDF filename, ingest filename, and optional alternate dataset name (default would be the ingest filename). If the dataset name is all ready in use, an error would be returned and the ingest file would not be amalgamated.

hdfFUSE would expose all /FUSE/ group datasets as files, mounted under a filesystem having the same name as the HDF file.

Initially it would be a read-only filesystem and would be a command line program. Shortly after, I plan to make it work under the OS gui so that when you click the HDF file it mounts its /FUSE/ datasets. Later on I would eliminate hdfFUSEin and make hdfFUSE a writable file system.

I do not think at this point it would advisable to FUSE mount every dataset in an HDF file as a regular file, but only specific datasets under /FUSE/ group such as 1D character contiguous datasets.

Before I code this up I wanted to put out this RFC for design ideas.

thanks, matt

Matthew Dougherty
713-433-3849
National Center for Macromolecular Imaging
Baylor College of Medicine/Houston Texas USA

=========================================================================

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Dougherty, Matthew T. (el 2007-12-17 a les 16:49:22 -0600) va dir::

[...]
Last week I developed a prototype using macfuse (which is based on the
soureforge FUSE), that allows me to mount an HDF file and make it look
like a regular file system. In the test I was able to generate a FBX
graphics files (part of another prototype) readable by the Maya
animation package, insert them into an HDF file, mount the HDF file as
a file system and have quicktime read the FBX datasets contained in
the HDF file.
[...]

This is really interesting, and we at Carabos dedicated some thought to
it. We were hovever focusing on an implementation based on PyTables +
python-fuse, but it was nothing more than a pipe dream. We somehow
ended up turning the sock inside out and implementing a file-like
interface to arrays in PyTables. :slight_smile:

Will dedicate some more thought to the topic to see if I come up with
some interesting comment.

Nice work!

::

  Ivan Vilata i Balaguer >qo< http://www.carabos.com/
         Cárabos Coop. V. V V Enjoy Data
                            ""

Hi Quincey,

If a user needs complexity they should use the HDF api directly, they can do the heavy lifting.

For simplicity I am imposing significant limitations:

1) I do not plan to traverse groups. At this point objects under the group /FUSE/ are only datasets, no groups or atrributes. Actually another HDF code could put attributes on the /FUSE/ datasets, but hdfFUSE will not be looking for them.

2) I sidestep the n-D issue. I say only 1D objects, because when I read the regular file this is all I know: name of the file and the length, assuming char values. Later on I could make allowances by looking at the file extension (e.g., '.mrc'), interpret the file's header, create a 2d/3d
datasest, store the header somewhere else in HDF; but this has a lot of problems. I may want to chunk the n-D images, or use compression (obstacles which could be overcome). By sticking to the paradigm that the dataset was originally a regular file, the assumption is some regular app can read it;
even if the dataset is quite complex, like ingesting another hdf file. If someone had a HDF file with some n-D dataset in it, and linked it into /FUSE/, hdfFUSE would expose it as a regular file system, with unknown consequences; the assumption being whoever coded the link knows what they are doing,
and takes the risks.

It turns out I do not need to expose complex n-D datasets to the world through a regular file system, but rather be able to ingest regular files into HDF as a means to amalgamate data transparently so that other apps ignorant of HDF can inter-operate with a single 'HDF' file containing all related
datasets. Most of the files I have in mind are going to be graphics files, configuration files (e.g., chimera map files, amira session files), PDB files, python files, or PDF documents. This hdf file becomes a nexus for an integrated project management animation framework.

I looked for IOCTL, there is nothing in the high level FUSE api, although there is something buried deep in another fuse include file I could not locate. Considering you can take a MS NFTS disk drive and attach it to a mac using FUSE, or connect a complex devices (e.g., buttons, accelerometer) and
make them look like files; most likely it could be done.

Matthew Dougherty
713-433-3849
National Center for Macromolecular Imaging
Baylor College of Medicine/Houston Texas USA

···

=========================================================================

        Sounds very interesting and something we've tossed around a few
times, but hadn't any funding to pursue. Traversing groups as
directories and reading data from 1-D datasets is pretty obvious, but
how do you handle creating datasets or reading datasets with >1
dimension? Is there some way to use ioctl() calls to pass along the
extra information needed?

        Quincey

Hi Matthew,

Hi Quincey,

If a user needs complexity they should use the HDF api directly, they can do the heavy lifting.

For simplicity I am imposing significant limitations:

1) I do not plan to traverse groups. At this point objects under the group /FUSE/ are only datasets, no groups or atrributes. Actually another HDF code could put attributes on the /FUSE/ datasets, but hdfFUSE will not be looking for them.

2) I sidestep the n-D issue. I say only 1D objects, because when I read the regular file this is all I know: name of the file and the length, assuming char values. Later on I could make allowances by looking at the file extension (e.g., '.mrc'), interpret the file's header, create a 2d/3d datasest, store the header somewhere else in HDF; but this has a lot of problems. I may want to chunk the n-D images, or use compression (obstacles which could be overcome). By sticking to the paradigm that the dataset was originally a regular file, the assumption is some regular app can read it; even if the dataset is quite complex, like ingesting another hdf file. If someone had a HDF file with some n-D dataset in it, and linked it into /FUSE/, hdfFUSE would expose it as a regular file system, with unknown consequences; the assumption being whoever coded the link knows what they are doing, and takes the risks.

It turns out I do not need to expose complex n-D datasets to the world through a regular file system, but rather be able to ingest regular files into HDF as a means to amalgamate data transparently so that other apps ignorant of HDF can inter-operate with a single 'HDF' file containing all related datasets. Most of the files I have in mind are going to be graphics files, configuration files (e.g., chimera map files, amira session files), PDB files, python files, or PDF documents. This hdf file becomes a nexus for an integrated project management animation framework.

  Sure, that makes sense for your use cases.

  BTW, if you'd like to contribute code for this functionality, we've helped set up a location for open source "addons" to HDF5 here:

  http://hdf5-addons.origo.ethz.ch/

  It's just getting off the ground, but it would be a good place to check your code in for others to see and/or improve.

  Quincey

···

On Dec 18, 2007, at 12:09 PM, Dougherty, Matthew T. wrote:

I looked for IOCTL, there is nothing in the high level FUSE api, although there is something buried deep in another fuse include file I could not locate. Considering you can take a MS NFTS disk drive and attach it to a mac using FUSE, or connect a complex devices (e.g., buttons, accelerometer) and make them look like files; most likely it could be done.

Matthew Dougherty
713-433-3849
National Center for Macromolecular Imaging
Baylor College of Medicine/Houston Texas USA

=========================================================================

        Sounds very interesting and something we've tossed around a few
times, but hadn't any funding to pursue. Traversing groups as
directories and reading data from 1-D datasets is pretty obvious, but
how do you handle creating datasets or reading datasets with >1
dimension? Is there some way to use ioctl() calls to pass along the
extra information needed?

        Quincey

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

sounds good. how come here rather than sourceforge or the hdfgroup website?

···

On Dec 18, 2007, at 9:43 PM, Quincey Koziol wrote:

  http://hdf5-addons.origo.ethz.ch/

Hi Matthew,

sounds good. how come here rather than sourceforge or the hdfgroup website?

  Well, we wanted to encourage a community of users to develop that wasn't entirely focused on having ourselves at the center. Also, we wanted to be clear that this set of code was the community's and we weren't the ones maintaining it.

  Quincey

···

On Dec 18, 2007, at 9:53 PM, Matthew Dougherty wrote:

On Dec 18, 2007, at 9:43 PM, Quincey Koziol wrote:

  http://hdf5-addons.origo.ethz.ch/

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.