Large File Support on Windows

Hi,

I am having some difficulty reading data from a large file (about 9GB) using
HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
Studio. The raw data is held in an external binary file which is accessed
via a dataset created in an HDF5 file. I am able to access the data up to
about the 2 GB point in the file which makes me think this is probably a
large file issue.

I know it is possible to read all of the data from my large file in C using
_fseeki64 and fread (I have C code that does this).

Perhaps I am missing a compilation option in Visual Studio?

I would be grateful if someone could point me towards a solution

Regards

Paul

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1589826.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Paul,

HDF5 library should handle this. What kind of errors are you getting?

Thank you!

Elena

···

On Sep 27, 2010, at 10:33 AM, seismic wrote:

Hi,

I am having some difficulty reading data from a large file (about 9GB) using
HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
Studio. The raw data is held in an external binary file which is accessed
via a dataset created in an HDF5 file. I am able to access the data up to
about the 2 GB point in the file which makes me think this is probably a
large file issue.

I know it is possible to read all of the data from my large file in C using
_fseeki64 and fread (I have C code that does this).

Perhaps I am missing a compilation option in Visual Studio?

I would be grateful if someone could point me towards a solution

Regards

Paul

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1589826.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

This is probably NOT common but just to make sure, are you using the
default Virtual File Driver (VFD) when reading on windows? If not, that
could be your problem. This is something you would control via file
access property lists. H5Pset_fapl... or some such call.

For example, I had code developed on linux that set the stdio VFD by
fiat. When it ran on Windows, that meant it was using stdio VFD there
too. But the stdio VFD was really intended for linux. It *should* work
fine on windows but I didn't want to chance it. So, I added #if _WIN32
compilation logic to NOT set VFD when on windows.

Mark

···

On Mon, 2010-09-27 at 08:58, Elena Pourmal wrote:

Paul,

HDF5 library should handle this. What kind of errors are you getting?

Thank you!

Elena

On Sep 27, 2010, at 10:33 AM, seismic wrote:

>
> Hi,
>
> I am having some difficulty reading data from a large file (about 9GB) using
> HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
> Studio. The raw data is held in an external binary file which is accessed
> via a dataset created in an HDF5 file. I am able to access the data up to
> about the 2 GB point in the file which makes me think this is probably a
> large file issue.
>
> I know it is possible to read all of the data from my large file in C using
> _fseeki64 and fread (I have C code that does this).
>
> Perhaps I am missing a compilation option in Visual Studio?
>
> I would be grateful if someone could point me towards a solution
>
> Regards
>
> Paul
>
> --
> View this message in context: http://BLOCKEDhdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1589826.html
> Sent from the hdf-forum mailing list archive at Nabble.com.
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://BLOCKEDmail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://BLOCKEDmail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Hi Elena,

thanks for the reply. Here is the error I am getting:

HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
  #000: ..\..\..\src\H5Dio.c line 174 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: ..\..\..\src\H5Dio.c line 404 in H5D_read(): can't read data
    major: Dataset
    minor: Read failed
  #002: ..\..\..\src\H5Dcontig.c line 520 in H5D_contig_read(): contiguous
read
failed
    major: Dataset
    minor: Read failed
  #003: ..\..\..\src\H5Dscatgath.c line 516 in H5D_scatgath_read(): file
gather
failed
    major: Low-level I/O
    minor: Read failed
  #004: ..\..\..\src\H5Dscatgath.c line 252 in H5D_gather_file(): read error
    major: Dataspace
    minor: Read failed
  #005: ..\..\..\src\H5Defl.c line 451 in H5D_efl_readvv(): block write
failed
    major: Low-level I/O
    minor: Write failed
  #006: ..\..\..\src\H5Defl.c line 280 in H5D_efl_read(): external file
address
overflowed
    major: External file list
    minor: Address overflowed

A bit more information - I am trying to read part of a 1D dataset which
consists of a compound data type. The raw data is held in an external binary
file. I can use the HDF5 library (C++ bindings) to read the data from the
external file okay, except if I try to access data after about the 2GB
point.

The error message above "external file address overflowed" looks
suspiciously like a large file access problem?

I have also had a quick look at the HDF5 source code. It seems to make use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit windows.
I believe Windows uses _fseeki64 rather than fseeko.

Regards

Paul

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1595574.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Hi Paul,

···

On Sep 28, 2010, at 7:00 AM, seismic wrote:

Hi Elena,

thanks for the reply. Here is the error I am getting:

HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: ..\..\..\src\H5Dio.c line 174 in H5Dread(): can't read data
   major: Dataset
   minor: Read failed
#001: ..\..\..\src\H5Dio.c line 404 in H5D_read(): can't read data
   major: Dataset
   minor: Read failed
#002: ..\..\..\src\H5Dcontig.c line 520 in H5D_contig_read(): contiguous
read
failed
   major: Dataset
   minor: Read failed
#003: ..\..\..\src\H5Dscatgath.c line 516 in H5D_scatgath_read(): file
gather
failed
   major: Low-level I/O
   minor: Read failed
#004: ..\..\..\src\H5Dscatgath.c line 252 in H5D_gather_file(): read error
   major: Dataspace
   minor: Read failed
#005: ..\..\..\src\H5Defl.c line 451 in H5D_efl_readvv(): block write
failed
   major: Low-level I/O
   minor: Write failed
#006: ..\..\..\src\H5Defl.c line 280 in H5D_efl_read(): external file
address
overflowed
   major: External file list
   minor: Address overflowed

A bit more information - I am trying to read part of a 1D dataset which
consists of a compound data type. The raw data is held in an external binary
file. I can use the HDF5 library (C++ bindings) to read the data from the
external file okay, except if I try to access data after about the 2GB
point.

The error message above "external file address overflowed" looks
suspiciously like a large file access problem?

I have also had a quick look at the HDF5 source code. It seems to make use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit windows.
I believe Windows uses _fseeki64 rather than fseeko.

  Ah, yes, you are probably correct. I've filed a bug in our issue tracker to investigate and correct this.

  Thanks,
    Quincey

Hi Quincey,

Quincey Koziol wrote:

Hi Paul,

suspiciously like a large file access problem?

I have also had a quick look at the HDF5 source code. It seems to make
use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit
windows.
I believe Windows uses _fseeki64 rather than fseeko.

  Ah, yes, you are probably correct. I've filed a bug in our issue tracker
to investigate and correct this.

  Thanks,
    Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

after further testing of this, I have verified that I can successfully read
past the 2GB point if the data is actually held within the HDF5 file itself.
So, it appears that the problem is confined to cases where the raw data is
held in an external file.

Can you give me an idea of when this problem could be looked at? If it may
take some time, I need to find a workaround. Unfortunately I am confined to
using the windows platform. I would also prefer not to have to import my raw
data into an HDF file either. This was one of the attractive features of HDF
that I could use it to create a wrapper around pre-existing binary files.

Best Regards

Paul

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1601568.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Hi Paul,

···

On Sep 29, 2010, at 5:44 AM, seismic wrote:

Hi Quincey,

Quincey Koziol wrote:

Hi Paul,

suspiciously like a large file access problem?

I have also had a quick look at the HDF5 source code. It seems to make
use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit
windows.
I believe Windows uses _fseeki64 rather than fseeko.

  Ah, yes, you are probably correct. I've filed a bug in our issue tracker
to investigate and correct this.

  Thanks,
    Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

after further testing of this, I have verified that I can successfully read
past the 2GB point if the data is actually held within the HDF5 file itself.
So, it appears that the problem is confined to cases where the raw data is
held in an external file.

Can you give me an idea of when this problem could be looked at? If it may
take some time, I need to find a workaround. Unfortunately I am confined to
using the windows platform. I would also prefer not to have to import my raw
data into an HDF file either. This was one of the attractive features of HDF
that I could use it to create a wrapper around pre-existing binary files.

  Can you try with the 1.8.6 release candidate:

http://www.hdfgroup.uiuc.edu/ftp/pub/outgoing/hdf5/hdf5-1.8.6-pre1/

  And let us know if that still has this problem?

  Thanks,
    Quincey

Hi Quincey,

Quincey Koziol wrote:

  Can you try with the 1.8.6 release candidate:

http://www.hdfgroup.uiuc.edu/ftp/pub/outgoing/hdf5/hdf5-1.8.6-pre1/

  And let us know if that still has this problem?

  Thanks,
    Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

I've just tried this and I have the same problem.

Thanks

Paul

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1602852.html
Sent from the hdf-forum mailing list archive at Nabble.com.

The issue might be the definition of WINDOWS_MAX_BUF (1024*1024*1024) in h5pubconf.h and used in the H5FDwindows.c code. This is a hard-coded value whenever on windows.

Allen

···

Hi Quincey,

Quincey Koziol wrote:
>
>
> Can you try with the 1.8.6 release candidate:
>
> http://www.hdfgroup.uiuc.edu/ftp/pub/outgoing/hdf5/hdf5-1.8.6-pre1/
>
> And let us know if that still has this problem?
>
> Thanks,
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>

I've just tried this and I have the same problem.

Thanks

Paul

Hi Paul,

···

On Sep 29, 2010, at 10:22 AM, seismic wrote:

Hi Quincey,

Quincey Koziol wrote:

  Can you try with the 1.8.6 release candidate:

http://www.hdfgroup.uiuc.edu/ftp/pub/outgoing/hdf5/hdf5-1.8.6-pre1/

  And let us know if that still has this problem?

  Thanks,
    Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

I've just tried this and I have the same problem.

  *drat* OK, we're investigating it now.

    Quincey

I see this too: 1.8.4 patch 1, using pytables. Command and output are below

My file size is ~1.1 GB. The table opens in HDFview, and as far as I can
tell is working in our applications.

···

----------------------------------------

for i in bhdf.root.alignments.intergenic.refmap.iterrows(1,10): print i

...
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140665648559872:
  #000: ../../../src/H5Dio.c line 174 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: ../../../src/H5Dio.c line 404 in H5D_read(): can't read data
    major: Dataset
    minor: Read failed
  #002: ../../../src/H5Dchunk.c line 1715 in H5D_chunk_read(): error looking
up chunk address
    major: Dataset
    minor: Can't get value
  #003: ../../../src/H5Dchunk.c line 2284 in H5D_chunk_get_info(): can't
query chunk address
    major: Dataset
    minor: Can't get value
  #004: ../../../src/H5Dbtree.c line 1010 in H5D_btree_idx_get_addr(): can't
get chunk info
    major: Dataset
    minor: Can't get value
  #005: ../../../src/H5B.c line 332 in H5B_find(): unable to load B-tree
node
    major: B-Tree node
    minor: Unable to load metadata into cache
  #006: ../../../src/H5AC.c line 1831 in H5AC_protect(): H5C_protect()
failed.
    major: Object cache
    minor: Unable to protect metadata
  #007: ../../../src/H5C.c line 6160 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #008: ../../../src/H5C.c line 10990 in H5C_load_entry(): unable to load
entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: ../../../src/H5Bcache.c line 201 in H5B_load(): can't read B-tree
node
    major: B-Tree node
    minor: Read failed
  #010: ../../../src/H5Fio.c line 113 in H5F_block_read(): read from
metadata accumulator failed
    major: Low-level I/O
    minor: Read failed
  #011: ../../../src/H5Faccum.c line 196 in H5F_accum_read(): driver read
request failed
    major: Low-level I/O
    minor: Read failed
  #012: ../../../src/H5FDint.c line 142 in H5FD_read(): driver read request
failed
    major: Virtual File Layer
    minor: Read failed
  #013: ../../../src/H5FDsec2.c line 755 in H5FD_sec2_read(): file read
failed
    major: Low-level I/O
    minor: Read failed
  #014: ../../../src/H5FDsec2.c line 755 in H5FD_sec2_read(): Bad file
descriptor
    major: Internal error (too specific to document in detail)
    minor: System error message
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "tableExtension.pyx", line 845, in tables.tableExtension.Row.__next__
  File "tableExtension.pyx", line 955, in
tables.tableExtension.Row.__next__general
  File "tableExtension.pyx", line 565, in
tables.tableExtension.Table._read_records
tables.exceptions.HDF5ExtError: Problems reading records.

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1814362.html
Sent from the hdf-forum mailing list archive at Nabble.com.

And, I forgot to say - other tables can be access just fine in pytables.

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1814364.html
Sent from the hdf-forum mailing list archive at Nabble.com.

A final note - what I was experiencing seemed to be related to the python interpreter. I could reproduce this with scripts, nor could I reproduce after restarting the interpreter.

Todd

···

On Oct 30, 2010, at 4:37 PM, todd wrote:

And, I forgot to say - other tables can be access just fine in pytables.

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1814364.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org