My code crashes upon upgrade to OSX 10.7 Lion

Hello --

I've got some C++ code which has worked on OSX 10.6, as well as a
couple of linux machines. This weekend, I upgraded to OSX 10.7 Lion,
and now my code crashes (gdb trace below). I posted about this
function of mine a little while ago, but didn't get any responses:

"Is there a better way to read a vector of strings?"
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2011-June/004778.html

So since my code is crashing, and presumably the HDF5 people have good
unit tests, my code is doing something incorrectly... Can anybody give
me some suggestions?

thanks!

Amos.

void
SolutionSet::
loadBinaryFile(const std::string& filename,
               energy::pEnergyModel model) {
   H5::H5File file (filename, H5F_ACC_RDONLY);
   H5::StrType st(H5::PredType::C_S1, H5T_VARIABLE);
   H5::DataType dt = H5::PredType::NATIVE_DOUBLE;
   hsize_t dims[1];
   H5::DataSet dataset;

   dataset = file.openDataSet(_H5_scores);
   dataset.getSpace().getSimpleExtentDims(dims, NULL);
   Vector_double scores (dims[0]); // this is just
std::vector<double>
   dataset.read(&scores[0],dt);

   for(std::size_t i=0; i<scores.size(); i++){
      std::stringstream confName;
      confName << _H5_conf << i;
      dataset = file.openDataSet(confName.str());
      dataset.getSpace().getSimpleExtentDims(dims, NULL);
      Vector_string conf(dims[0]); // this is just
std::vector<std::string>
      for(std::size_t j=0; j<dims[0]; j++){
         H5::DataSpace file_space = dataset.getSpace();
         hsize_t count[1] = {1};
         hsize_t start[1] = {j};
         file_space.selectHyperslab(H5S_SELECT_SET,count,start);
         hsize_t dim1[1] = {1};
         H5::DataSpace mem_space(1,dim1);
         dataset.read(conf[j],st,mem_space,file_space); // this is
line 393
      }

      std::stringstream crdName;
      crdName << _H5_coords << i;
      dataset = file.openDataSet(crdName.str());
      dataset.getSpace().getSimpleExtentDims(dims, NULL);
      Vector_double coords(dims[0]);
      dataset.read(&coords[0],dt);

      deserialize(model,conf,coords,scores[i]);
   }
}

Python(85973) malloc: *** error for object 0x103ff2820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGABRT, Aborted.
0x00007fff8ec7282a in __kill ()
(gdb) where
#0 0x00007fff8ec7282a in __kill ()
#1 0x00007fff8d999a9c in abort ()
#2 0x00007fff8d9f884c in free ()
#3 0x00007fff8c3fb702 in std::string::_Rep::_M_dispose ()
#4 0x00007fff8c3fcaab in std::string::_M_mutate ()
#5 0x00007fff8c3fcb2d in std::string::_M_replace_safe ()
#6 0x000000010412c175 in H5::DataSet::p_read_variable_len ()
#7 0x000000010412e660 in H5::DataSet::read ()
#8 0x00000001004d8ccc in triad::opt::SolutionSet::loadBinaryFile
(this=0x7fff5fbfeba0, filename=@0x1315c6c90, model=@0x7fff5fbfeba0) at
SolutionSet.cc:393

Did you set that breakpoint? On Mac OS, these kinds of bugs can often be found more easily using the env vars documented in 'man malloc' and by using guard malloc, documented in 'man libgmalloc'.

···

On Tue, 6 Sep 2011 16:55:14 -0700, Amos Anderson said:

Python(85973) malloc: *** error for object 0x103ff2820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug

--
____________________________________________________________
Sean McBride, B. Eng sean@rogue-research.com
Rogue Research www.rogue-research.com
Mac Software Developer Montréal, Québec, Canada

Hello Amos,

I'm trying to help figuring out the crash problem. Would that be possible
for you to send me your data file so I can investigate further?

Thanks,
Binh-Minh

···

-------------
The HDF Group

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Amos Anderson
Sent: Tuesday, September 06, 2011 7:55 PM
To: hdf-forum@hdfgroup.org; HDF Helpdesk
Subject: [Hdf-forum] My code crashes upon upgrade to OSX 10.7 Lion

Hello --

I've got some C++ code which has worked on OSX 10.6, as well as a
couple of linux machines. This weekend, I upgraded to OSX 10.7 Lion,
and now my code crashes (gdb trace below). I posted about this
function of mine a little while ago, but didn't get any responses:

"Is there a better way to read a vector of strings?"
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2011-June/004778.h
tml

So since my code is crashing, and presumably the HDF5 people have good
unit tests, my code is doing something incorrectly... Can anybody give
me some suggestions?

thanks!

Amos.

void
SolutionSet::
loadBinaryFile(const std::string& filename,
               energy::pEnergyModel model) {
   H5::H5File file (filename, H5F_ACC_RDONLY);
   H5::StrType st(H5::PredType::C_S1, H5T_VARIABLE);
   H5::DataType dt = H5::PredType::NATIVE_DOUBLE;
   hsize_t dims[1];
   H5::DataSet dataset;

   dataset = file.openDataSet(_H5_scores);
   dataset.getSpace().getSimpleExtentDims(dims, NULL);
   Vector_double scores (dims[0]); // this is just
std::vector<double>
   dataset.read(&scores[0],dt);

   for(std::size_t i=0; i<scores.size(); i++){
      std::stringstream confName;
      confName << _H5_conf << i;
      dataset = file.openDataSet(confName.str());
      dataset.getSpace().getSimpleExtentDims(dims, NULL);
      Vector_string conf(dims[0]); // this is just
std::vector<std::string>
      for(std::size_t j=0; j<dims[0]; j++){
         H5::DataSpace file_space = dataset.getSpace();
         hsize_t count[1] = {1};
         hsize_t start[1] = {j};
         file_space.selectHyperslab(H5S_SELECT_SET,count,start);
         hsize_t dim1[1] = {1};
         H5::DataSpace mem_space(1,dim1);
         dataset.read(conf[j],st,mem_space,file_space); // this is
line 393
      }

      std::stringstream crdName;
      crdName << _H5_coords << i;
      dataset = file.openDataSet(crdName.str());
      dataset.getSpace().getSimpleExtentDims(dims, NULL);
      Vector_double coords(dims[0]);
      dataset.read(&coords[0],dt);

      deserialize(model,conf,coords,scores[i]);
   }
}

Python(85973) malloc: *** error for object 0x103ff2820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGABRT, Aborted.
0x00007fff8ec7282a in __kill ()
(gdb) where
#0 0x00007fff8ec7282a in __kill ()
#1 0x00007fff8d999a9c in abort ()
#2 0x00007fff8d9f884c in free ()
#3 0x00007fff8c3fb702 in std::string::_Rep::_M_dispose ()
#4 0x00007fff8c3fcaab in std::string::_M_mutate ()
#5 0x00007fff8c3fcb2d in std::string::_M_replace_safe ()
#6 0x000000010412c175 in H5::DataSet::p_read_variable_len ()
#7 0x000000010412e660 in H5::DataSet::read ()
#8 0x00000001004d8ccc in triad::opt::SolutionSet::loadBinaryFile
(this=0x7fff5fbfeba0, filename=@0x1315c6c90, model=@0x7fff5fbfeba0) at
SolutionSet.cc:393

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Internal Virus Database is out-of-date.
Checked by AVG.
Version: 7.5.549 / Virus Database: 270.9.0/1778 - Release Date: 11/9/2008
2:14 PM

Hello Binh-Minh --

I extracted a unit test for myself which is completely independent of
our code, and I've attached it. However, this unit test passes
(there's no crash). This implies that the problem depends on something
else in our code.... Unfortunately, valgrind has not been updated to
work with OSX Lion yet... so no help there.

However, if I insert an extra line:
         H5::DataSpace mem_space(1,dim1);
         conf[j] = std::string("this is a long string"); // the extra line
         dataset.read(conf[j],st,mem_space,file_space);

i.e. initializing the std::string to something other than ""

Then my original code produces the correct result...

but then if I replace that line with:
conf[j] = std::string("");

then I get the crash. So I guess the question is how well HDF5 can
work with std::string instead of char*.... because the string ends up
with the right length when I initialized it to "this is a long
string", but if it's "" then it can't resize it?

Hi Sean --

Did you set that breakpoint? On Mac OS, these kinds of bugs can often be found more easily using the env vars documented in 'man malloc' and by using guard malloc, documented in 'man libgmalloc'.

Yes, I did set the breakpoint, but it didn't give me any new
information. I'll have to give those env variables a try if we can't
figure it out without them.

Amos.

hdf5_str.cc (2.52 KB)

test.solns (1.85 MB)

···

On Wed, Sep 7, 2011 at 11:32 AM, Binh-Minh Ribler <bmribler@hdfgroup.org> wrote:

Hello Amos,

I'm trying to help figuring out the crash problem. Would that be possible
for you to send me your data file so I can investigate further?

Thanks,
Binh-Minh
-------------
The HDF Group

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Amos Anderson
Sent: Tuesday, September 06, 2011 7:55 PM
To: hdf-forum@hdfgroup.org; HDF Helpdesk
Subject: [Hdf-forum] My code crashes upon upgrade to OSX 10.7 Lion

Hello --

I've got some C++ code which has worked on OSX 10.6, as well as a
couple of linux machines. This weekend, I upgraded to OSX 10.7 Lion,
and now my code crashes (gdb trace below). I posted about this
function of mine a little while ago, but didn't get any responses:

"Is there a better way to read a vector of strings?"
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2011-June/004778.h
tml

So since my code is crashing, and presumably the HDF5 people have good
unit tests, my code is doing something incorrectly... Can anybody give
me some suggestions?

thanks!

Amos.

void
SolutionSet::
loadBinaryFile(const std::string& filename,
energy::pEnergyModel model) {
H5::H5File file (filename, H5F_ACC_RDONLY);
H5::StrType st(H5::PredType::C_S1, H5T_VARIABLE);
H5::DataType dt = H5::PredType::NATIVE_DOUBLE;
hsize_t dims[1];
H5::DataSet dataset;

dataset = file.openDataSet(_H5_scores);
dataset.getSpace().getSimpleExtentDims(dims, NULL);
Vector_double scores (dims[0]); // this is just
std::vector<double>
dataset.read(&scores[0],dt);

for(std::size_t i=0; i<scores.size(); i++){
std::stringstream confName;
confName << _H5_conf << i;
dataset = file.openDataSet(confName.str());
dataset.getSpace().getSimpleExtentDims(dims, NULL);
Vector_string conf(dims[0]); // this is just
std::vector<std::string>
for(std::size_t j=0; j<dims[0]; j++){
H5::DataSpace file_space = dataset.getSpace();
hsize_t count[1] = {1};
hsize_t start[1] = {j};
file_space.selectHyperslab(H5S_SELECT_SET,count,start);
hsize_t dim1[1] = {1};
H5::DataSpace mem_space(1,dim1);
dataset.read(conf[j],st,mem_space,file_space); // this is
line 393
}

 std::stringstream crdName;
 crdName &lt;&lt; \_H5\_coords &lt;&lt; i;
 dataset = file\.openDataSet\(crdName\.str\(\)\);
 dataset\.getSpace\(\)\.getSimpleExtentDims\(dims, NULL\);
 Vector\_double coords\(dims\[0\]\);
 dataset\.read\(&amp;coords\[0\],dt\);

 deserialize\(model,conf,coords,scores\[i\]\);

}
}

Python(85973) malloc: *** error for object 0x103ff2820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGABRT, Aborted.
0x00007fff8ec7282a in __kill ()
(gdb) where
#0 0x00007fff8ec7282a in __kill ()
#1 0x00007fff8d999a9c in abort ()
#2 0x00007fff8d9f884c in free ()
#3 0x00007fff8c3fb702 in std::string::_Rep::_M_dispose ()
#4 0x00007fff8c3fcaab in std::string::_M_mutate ()
#5 0x00007fff8c3fcb2d in std::string::_M_replace_safe ()
#6 0x000000010412c175 in H5::DataSet::p_read_variable_len ()
#7 0x000000010412e660 in H5::DataSet::read ()
#8 0x00000001004d8ccc in triad::opt::SolutionSet::loadBinaryFile
(this=0x7fff5fbfeba0, filename=@0x1315c6c90, model=@0x7fff5fbfeba0) at
SolutionSet.cc:393

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Internal Virus Database is out-of-date.
Checked by AVG.
Version: 7.5.549 / Virus Database: 270.9.0/1778 - Release Date: 11/9/2008
2:14 PM

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
~<>~<>~<>~<>~<>~<>~<>~<>~
Amos G. Anderson
+1-626-399-8958 (cell)

I haven't had the time to test this hypothesis but looking at the HDF5
C++ wrapping code it looks like it may not be terminating the string
before converting to C++ std::string if the H5Dread doesn't do that by
itself for some reason. Maybe worth trying this patch?

*** hdf5-1.8.7/c++/src/H5DataSet-fix.cpp 2011-09-28 11:44:09.000000000 +0100
--- hdf5-1.8.7/c++/src/H5DataSet.cpp 2011-04-20 22:23:08.000000000 +0100

···

***************
*** 721,729 ****
        throw DataSetIException("DataSet::read", "H5Dread failed for fixed length string");
    }
  
- // Terminate the string
- strg_C[attr_size]=0;
-
    // Get string from the C char* and release resource allocated locally
    strg = strg_C;
    delete []strg_C;
--- 721,726 ----

--
Bojan Nikolic || http://www.bnikolic.co.uk

Hi Bojan --

The crash is not happening in DataSet::p_read_fixed_len. I tried a
bunch of things with Binh-Minh Ribler, but we were unable to resolve
it, but here's a summary. The crash is in DataSet::p_read_variable_len
on the line with:

strg = strg_C;

The error is a double free on strg:

Python(10981) malloc: *** error for object 0x104010820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[amosa:10981] *** Process received signal ***
[amosa:10981] Signal: Abort trap: 6 (6)
[amosa:10981] Signal code: (0)
[amosa:10981] [ 0] 2 libsystem_c.dylib
0x00007fff8ce3bcfa _sigtramp + 26
[amosa:10981] [ 1] 3 libhdf5.7.dylib
0x00000001042ec3f1 H5S_select_hyperslab + 1393
[amosa:10981] [ 2] 4 libsystem_c.dylib
0x00007fff8ce3984c free + 389
[amosa:10981] [ 3] 5 libstdc++.6.dylib
0x00007fff8c8f3702 _ZNSs4_Rep10_M_disposeERKSaIcE + 60
[amosa:10981] [ 4] 6 libstdc++.6.dylib
0x00007fff8c8f4aab _ZNSs9_M_mutateEmmm + 281
[amosa:10981] [ 5] 7 libstdc++.6.dylib
0x00007fff8c8f4b2d _ZNSs15_M_replace_safeEmmPKcm + 37
[amosa:10981] [ 6] 8 libhdf5_cpp.7.dylib
0x000000010414af90 _ZNK2H57DataSet19p_read_variable_lenEiiiiRSs + 1316
[amosa:10981] [ 7] 9 libhdf5_cpp.7.dylib
0x000000010414d5ea
_ZNK2H57DataSet4readERSsRKNS_8DataTypeERKNS_9DataSpaceES7_RKNS_19DSetMemXferPropListE
+ 1286

The problem does not occur in a standalone unit test I made (and there
is no problem on linux or osx 10.6). This implies the problem could be
somewhere else in the rest of my code, but unfortunately valgrind
doesn't work on osx 10.7 yet (and no problems have been detected on
other machines). However, I can make the problem go away if I
initialize the std::string in my code before calling DataSet::read:

   H5::StrType st(H5::PredType::C_S1, H5T_VARIABLE);
   H5::DataSet dataset;
      Vector_string conf(dims[0]);
#if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 1070
         conf[j] = "hack to avoid crash";
#endif
         dataset.read(conf[j],st,mem_space,file_space);

Doing a google search on the error message itself along with a few
keywords revealed this:


and other similar posts, problems which I am able to replicate, and
sets a precedent for the problem being with the compiler itself. OSX
10.7 in general has seemed somewhat buggy too...

Amos.

···

On Wed, Sep 28, 2011 at 3:48 AM, Bojan Nikolic <bojan@bnikolic.co.uk> wrote:

I haven't had the time to test this hypothesis but looking at the HDF5
C++ wrapping code it looks like it may not be terminating the string
before converting to C++ std::string if the H5Dread doesn't do that by
itself for some reason. Maybe worth trying this patch?

*** hdf5-1.8.7/c++/src/H5DataSet-fix.cpp 2011-09-28 11:44:09.000000000 +0100
--- hdf5-1.8.7/c++/src/H5DataSet.cpp 2011-04-20 22:23:08.000000000 +0100
***************
*** 721,729 ****
throw DataSetIException("DataSet::read", "H5Dread failed for fixed length string");
}

- // Terminate the string
- strg_C[attr_size]=0;
-
// Get string from the C char* and release resource allocated locally
strg = strg_C;
delete []strg_C;
--- 721,726 ----

--
Bojan Nikolic || http://www.bnikolic.co.uk

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
~<>~<>~<>~<>~<>~<>~<>~<>~
Amos G. Anderson
+1-626-399-8958 (cell)

If you are running on OS X 10.6 where is conf[j] created/initialized? maybe the compiler is getting more picky in OS X 10.7?

···

___________________________________________________________
Mike Jackson www.bluequartz.net
Principal Software Engineer mike.jackson@bluequartz.net
BlueQuartz Software Dayton, Ohio

On Sep 28, 2011, at 1:28 PM, Amos Anderson wrote:

Hi Bojan --

The crash is not happening in DataSet::p_read_fixed_len. I tried a
bunch of things with Binh-Minh Ribler, but we were unable to resolve
it, but here's a summary. The crash is in DataSet::p_read_variable_len
on the line with:

strg = strg_C;

The error is a double free on strg:

Python(10981) malloc: *** error for object 0x104010820: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[amosa:10981] *** Process received signal ***
[amosa:10981] Signal: Abort trap: 6 (6)
[amosa:10981] Signal code: (0)
[amosa:10981] [ 0] 2 libsystem_c.dylib
0x00007fff8ce3bcfa _sigtramp + 26
[amosa:10981] [ 1] 3 libhdf5.7.dylib
0x00000001042ec3f1 H5S_select_hyperslab + 1393
[amosa:10981] [ 2] 4 libsystem_c.dylib
0x00007fff8ce3984c free + 389
[amosa:10981] [ 3] 5 libstdc++.6.dylib
0x00007fff8c8f3702 _ZNSs4_Rep10_M_disposeERKSaIcE + 60
[amosa:10981] [ 4] 6 libstdc++.6.dylib
0x00007fff8c8f4aab _ZNSs9_M_mutateEmmm + 281
[amosa:10981] [ 5] 7 libstdc++.6.dylib
0x00007fff8c8f4b2d _ZNSs15_M_replace_safeEmmPKcm + 37
[amosa:10981] [ 6] 8 libhdf5_cpp.7.dylib
0x000000010414af90 _ZNK2H57DataSet19p_read_variable_lenEiiiiRSs + 1316
[amosa:10981] [ 7] 9 libhdf5_cpp.7.dylib
0x000000010414d5ea
_ZNK2H57DataSet4readERSsRKNS_8DataTypeERKNS_9DataSpaceES7_RKNS_19DSetMemXferPropListE
+ 1286

The problem does not occur in a standalone unit test I made (and there
is no problem on linux or osx 10.6). This implies the problem could be
somewhere else in the rest of my code, but unfortunately valgrind
doesn't work on osx 10.7 yet (and no problems have been detected on
other machines). However, I can make the problem go away if I
initialize the std::string in my code before calling DataSet::read:

  H5::StrType st(H5::PredType::C_S1, H5T_VARIABLE);
  H5::DataSet dataset;
     Vector_string conf(dims[0]);
#if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 1070
        conf[j] = "hack to avoid crash";
#endif
        dataset.read(conf[j],st,mem_space,file_space);

Doing a google search on the error message itself along with a few
keywords revealed this:
http://stackoverflow.com/questions/1962685/xcode-stl-c-debug-compile-error
and other similar posts, problems which I am able to replicate, and
sets a precedent for the problem being with the compiler itself. OSX
10.7 in general has seemed somewhat buggy too...

Amos.

On Wed, Sep 28, 2011 at 3:48 AM, Bojan Nikolic <bojan@bnikolic.co.uk> wrote:

I haven't had the time to test this hypothesis but looking at the HDF5
C++ wrapping code it looks like it may not be terminating the string
before converting to C++ std::string if the H5Dread doesn't do that by
itself for some reason. Maybe worth trying this patch?

*** hdf5-1.8.7/c++/src/H5DataSet-fix.cpp 2011-09-28 11:44:09.000000000 +0100
--- hdf5-1.8.7/c++/src/H5DataSet.cpp 2011-04-20 22:23:08.000000000 +0100
***************
*** 721,729 ****
           throw DataSetIException("DataSet::read", "H5Dread failed for fixed length string");
       }

- // Terminate the string
- strg_C[attr_size]=0;
-
       // Get string from the C char* and release resource allocated locally
       strg = strg_C;
       delete []strg_C;
--- 721,726 ----

--
Bojan Nikolic || http://www.bnikolic.co.uk

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
~<>~<>~<>~<>~<>~<>~<>~<>~
Amos G. Anderson
+1-626-399-8958 (cell)

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

If you build valgrind from svn trunk, it now kinda works on 10.7.

Apple's dev tools have similar memory debugging tools. There are env vars that affect malloc, see 'man malloc' and there is guard malloc, see 'man libgmalloc'.

Either of those might help you track it down.

···

On Wed, 28 Sep 2011 10:28:52 -0700, Amos Anderson said:

Python(10981) malloc: *** error for object 0x104010820: pointer being
freed was not allocated

*SNIP*

The problem does not occur in a standalone unit test I made (and there
is no problem on linux or osx 10.6). This implies the problem could be
somewhere else in the rest of my code, but unfortunately valgrind
doesn't work on osx 10.7 yet

--
____________________________________________________________
Sean McBride, B. Eng sean@rogue-research.com
Rogue Research www.rogue-research.com
Mac Software Developer Montréal, Québec, Canada