I am having some difficulty reading data from a large file (about 9GB) using
HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
Studio. The raw data is held in an external binary file which is accessed
via a dataset created in an HDF5 file. I am able to access the data up to
about the 2 GB point in the file which makes me think this is probably a
large file issue.
I know it is possible to read all of the data from my large file in C using
_fseeki64 and fread (I have C code that does this).
Perhaps I am missing a compilation option in Visual Studio?
I would be grateful if someone could point me towards a solution
HDF5 library should handle this. What kind of errors are you getting?
Thank you!
Elena
···
On Sep 27, 2010, at 10:33 AM, seismic wrote:
Hi,
I am having some difficulty reading data from a large file (about 9GB) using
HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
Studio. The raw data is held in an external binary file which is accessed
via a dataset created in an HDF5 file. I am able to access the data up to
about the 2 GB point in the file which makes me think this is probably a
large file issue.
I know it is possible to read all of the data from my large file in C using
_fseeki64 and fread (I have C code that does this).
Perhaps I am missing a compilation option in Visual Studio?
I would be grateful if someone could point me towards a solution
This is probably NOT common but just to make sure, are you using the
default Virtual File Driver (VFD) when reading on windows? If not, that
could be your problem. This is something you would control via file
access property lists. H5Pset_fapl... or some such call.
For example, I had code developed on linux that set the stdio VFD by
fiat. When it ran on Windows, that meant it was using stdio VFD there
too. But the stdio VFD was really intended for linux. It *should* work
fine on windows but I didn't want to chance it. So, I added #if _WIN32
compilation logic to NOT set VFD when on windows.
Mark
···
On Mon, 2010-09-27 at 08:58, Elena Pourmal wrote:
Paul,
HDF5 library should handle this. What kind of errors are you getting?
Thank you!
Elena
On Sep 27, 2010, at 10:33 AM, seismic wrote:
>
> Hi,
>
> I am having some difficulty reading data from a large file (about 9GB) using
> HDF5-1.8.5-patch1. This was built under windows XP (32-bit) using Visual
> Studio. The raw data is held in an external binary file which is accessed
> via a dataset created in an HDF5 file. I am able to access the data up to
> about the 2 GB point in the file which makes me think this is probably a
> large file issue.
>
> I know it is possible to read all of the data from my large file in C using
> _fseeki64 and fread (I have C code that does this).
>
> Perhaps I am missing a compilation option in Visual Studio?
>
> I would be grateful if someone could point me towards a solution
>
> Regards
>
> Paul
>
> --
> View this message in context: http://BLOCKEDhdf-forum.184993.n3.nabble.com/Large-File-Support-on-Windows-tp1589826p1589826.html
> Sent from the hdf-forum mailing list archive at Nabble.com.
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://BLOCKEDmail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================ miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
thanks for the reply. Here is the error I am getting:
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0: #000: ..\..\..\src\H5Dio.c line 174 in H5Dread(): can't read data
major: Dataset
minor: Read failed #001: ..\..\..\src\H5Dio.c line 404 in H5D_read(): can't read data
major: Dataset
minor: Read failed #002: ..\..\..\src\H5Dcontig.c line 520 in H5D_contig_read(): contiguous
read
failed
major: Dataset
minor: Read failed #003: ..\..\..\src\H5Dscatgath.c line 516 in H5D_scatgath_read(): file
gather
failed
major: Low-level I/O
minor: Read failed #004: ..\..\..\src\H5Dscatgath.c line 252 in H5D_gather_file(): read error
major: Dataspace
minor: Read failed #005: ..\..\..\src\H5Defl.c line 451 in H5D_efl_readvv(): block write
failed
major: Low-level I/O
minor: Write failed #006: ..\..\..\src\H5Defl.c line 280 in H5D_efl_read(): external file
address
overflowed
major: External file list
minor: Address overflowed
A bit more information - I am trying to read part of a 1D dataset which
consists of a compound data type. The raw data is held in an external binary
file. I can use the HDF5 library (C++ bindings) to read the data from the
external file okay, except if I try to access data after about the 2GB
point.
The error message above "external file address overflowed" looks
suspiciously like a large file access problem?
I have also had a quick look at the HDF5 source code. It seems to make use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit windows.
I believe Windows uses _fseeki64 rather than fseeko.
thanks for the reply. Here is the error I am getting:
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0: #000: ..\..\..\src\H5Dio.c line 174 in H5Dread(): can't read data
major: Dataset
minor: Read failed #001: ..\..\..\src\H5Dio.c line 404 in H5D_read(): can't read data
major: Dataset
minor: Read failed #002: ..\..\..\src\H5Dcontig.c line 520 in H5D_contig_read(): contiguous
read
failed
major: Dataset
minor: Read failed #003: ..\..\..\src\H5Dscatgath.c line 516 in H5D_scatgath_read(): file
gather
failed
major: Low-level I/O
minor: Read failed #004: ..\..\..\src\H5Dscatgath.c line 252 in H5D_gather_file(): read error
major: Dataspace
minor: Read failed #005: ..\..\..\src\H5Defl.c line 451 in H5D_efl_readvv(): block write
failed
major: Low-level I/O
minor: Write failed #006: ..\..\..\src\H5Defl.c line 280 in H5D_efl_read(): external file
address
overflowed
major: External file list
minor: Address overflowed
A bit more information - I am trying to read part of a 1D dataset which
consists of a compound data type. The raw data is held in an external binary
file. I can use the HDF5 library (C++ bindings) to read the data from the
external file okay, except if I try to access data after about the 2GB
point.
The error message above "external file address overflowed" looks
suspiciously like a large file access problem?
I have also had a quick look at the HDF5 source code. It seems to make use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit windows.
I believe Windows uses _fseeki64 rather than fseeko.
Ah, yes, you are probably correct. I've filed a bug in our issue tracker to investigate and correct this.
I have also had a quick look at the HDF5 source code. It seems to make
use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit
windows.
I believe Windows uses _fseeki64 rather than fseeko.
Ah, yes, you are probably correct. I've filed a bug in our issue tracker
to investigate and correct this.
after further testing of this, I have verified that I can successfully read
past the 2GB point if the data is actually held within the HDF5 file itself.
So, it appears that the problem is confined to cases where the raw data is
held in an external file.
Can you give me an idea of when this problem could be looked at? If it may
take some time, I need to find a workaround. Unfortunately I am confined to
using the windows platform. I would also prefer not to have to import my raw
data into an HDF file either. This was one of the attractive features of HDF
that I could use it to create a wrapper around pre-existing binary files.
I have also had a quick look at the HDF5 source code. It seems to make
use
of 'fseeko' if it is available otherwise it uses 'fseek' to position the
file pointer. I suspect this could be why it doesn't work on 32-bit
windows.
I believe Windows uses _fseeki64 rather than fseeko.
Ah, yes, you are probably correct. I've filed a bug in our issue tracker
to investigate and correct this.
after further testing of this, I have verified that I can successfully read
past the 2GB point if the data is actually held within the HDF5 file itself.
So, it appears that the problem is confined to cases where the raw data is
held in an external file.
Can you give me an idea of when this problem could be looked at? If it may
take some time, I need to find a workaround. Unfortunately I am confined to
using the windows platform. I would also prefer not to have to import my raw
data into an HDF file either. This was one of the attractive features of HDF
that I could use it to create a wrapper around pre-existing binary files.
The issue might be the definition of WINDOWS_MAX_BUF (1024*1024*1024) in h5pubconf.h and used in the H5FDwindows.c code. This is a hard-coded value whenever on windows.
I see this too: 1.8.4 patch 1, using pytables. Command and output are below
My file size is ~1.1 GB. The table opens in HDFview, and as far as I can
tell is working in our applications.
···
----------------------------------------
for i in bhdf.root.alignments.intergenic.refmap.iterrows(1,10): print i
...
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140665648559872: #000: ../../../src/H5Dio.c line 174 in H5Dread(): can't read data
major: Dataset
minor: Read failed #001: ../../../src/H5Dio.c line 404 in H5D_read(): can't read data
major: Dataset
minor: Read failed #002: ../../../src/H5Dchunk.c line 1715 in H5D_chunk_read(): error looking
up chunk address
major: Dataset
minor: Can't get value #003: ../../../src/H5Dchunk.c line 2284 in H5D_chunk_get_info(): can't
query chunk address
major: Dataset
minor: Can't get value #004: ../../../src/H5Dbtree.c line 1010 in H5D_btree_idx_get_addr(): can't
get chunk info
major: Dataset
minor: Can't get value #005: ../../../src/H5B.c line 332 in H5B_find(): unable to load B-tree
node
major: B-Tree node
minor: Unable to load metadata into cache #006: ../../../src/H5AC.c line 1831 in H5AC_protect(): H5C_protect()
failed.
major: Object cache
minor: Unable to protect metadata #007: ../../../src/H5C.c line 6160 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache #008: ../../../src/H5C.c line 10990 in H5C_load_entry(): unable to load
entry
major: Object cache
minor: Unable to load metadata into cache #009: ../../../src/H5Bcache.c line 201 in H5B_load(): can't read B-tree
node
major: B-Tree node
minor: Read failed #010: ../../../src/H5Fio.c line 113 in H5F_block_read(): read from
metadata accumulator failed
major: Low-level I/O
minor: Read failed #011: ../../../src/H5Faccum.c line 196 in H5F_accum_read(): driver read
request failed
major: Low-level I/O
minor: Read failed #012: ../../../src/H5FDint.c line 142 in H5FD_read(): driver read request
failed
major: Virtual File Layer
minor: Read failed #013: ../../../src/H5FDsec2.c line 755 in H5FD_sec2_read(): file read
failed
major: Low-level I/O
minor: Read failed #014: ../../../src/H5FDsec2.c line 755 in H5FD_sec2_read(): Bad file
descriptor
major: Internal error (too specific to document in detail)
minor: System error message
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tableExtension.pyx", line 845, in tables.tableExtension.Row.__next__
File "tableExtension.pyx", line 955, in
tables.tableExtension.Row.__next__general
File "tableExtension.pyx", line 565, in
tables.tableExtension.Table._read_records
tables.exceptions.HDF5ExtError: Problems reading records.
A final note - what I was experiencing seemed to be related to the python interpreter. I could reproduce this with scripts, nor could I reproduce after restarting the interpreter.
Todd
···
On Oct 30, 2010, at 4:37 PM, todd wrote:
And, I forgot to say - other tables can be access just fine in pytables.