Issue with h5pyd Table class

theo.plantefol34 · January 20, 2025, 9:35am

Hi, I am using h5pyd Table class and read_where() method to retrieve specific data from my dataset. I think there is an issue when there is a number in the column name.
My dataset has two identical columns, one named “column_A” and the other “column_1”.
When running some query against column_A, everything works fine. But when I do the same test for column_1, I get the following error:

Traceback (most recent call last):
  File "C:\Users\TheoPLANTEFOL\Desktop\scripts\LOT3\4_hsds_queries.py", line 38, in <module>
    results = table.read_where(condition)      # note : include_index does not work
  File "C:\Users\TheoPLANTEFOL\Desktop\scripts\win-venv\Lib\site-packages\h5pyd\_hl\table.py", line 204, in read_where
    rsp = self.GET(req, params=params)
  File "C:\Users\TheoPLANTEFOL\Desktop\scripts\win-venv\Lib\site-packages\h5pyd\_hl\base.py", line 973, in GET
    rsp = self.id._http_conn.GET(req, params=params, headers=headers, format=format, use_cache=use_cache)
  File "C:\Users\TheoPLANTEFOL\Desktop\scripts\win-venv\Lib\site-packages\h5pyd\_hl\httpconn.py", line 522, in GET
    body = json.loads(rsp.text)
  File "C:\Program Files\Python313\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "C:\Program Files\Python313\Lib\json\decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python313\Lib\json\decoder.py", line 363, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This is my code:

import h5pyd

with h5pyd.File("hdf5://home/tests/dataset.h5", "r") as file:
    table = file.get("table")
    results_A = table.read_where(condition="column_A < 20") # works
    results_1 = table.read_where(condition="colunm_1 < 20") # does not work

jreadey · January 21, 2025, 1:29pm

Yes, your right! The logic used to parse the query expressions could be a bit more robust.
I have a fix in hsds master now. Or if you don’t want to build the image yourself you can pull it off of docker hub as hdfgroup/hsds:sha-c3ec61b.

Let me know how if that resolves the issue for you.

theo.plantefol34 · January 22, 2025, 10:15am

Hi @jreadey, I just run my script with the new docker image and now it works well with columns that contain numbers. Thank you very much for your quick fix!
By the way, I start HSDS by running runall.sh but it pulls and uses the image with ‘latest’ tag, even though I pulled hdfgroup/hsds:sha-c3ec61b. To get around that, I modified the admin/docker/docker-compose.azure.yml to make sure to download the right image. Is there a way to specify which image to use when running runall.sh ?

jreadey · January 23, 2025, 3:24am

Great! Glad the fix worked for you.

Sorry, there’s not a runall.sh option for the image tag. But you can just tag whichever image you want to use as latest and that should work. E.g.:

$ docker tag hdfgroup/hsds:shs-c3ec61b hdfgroup/hsds:latest

theo.plantefol34 · January 23, 2025, 4:33pm

Would it be possible in the future to give a selection of columns as argument to Table.read_where(), like pandas.read_hdf() ?
Besides, there are arguments in the definition of the method read_where() like step and include_index that does not seem to work

jreadey · January 24, 2025, 10:11am

I’m working on major h5pyd update now (though it will be sometime before it’s available). I’ll look into adding the column arg as well as what’s wrong with step and include_index.

Are there other Pandas DataFrame methods you’d like to see?

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Issue with h5pyd Table class