Modify

Ticket #12861 (closed Bugs: fixed)

Opened 2 months ago

Last modified 2 months ago

Segmentation fault when creating R-tree with packing algorithm with gcc4.8.2

Reported by: michael.moessner@… Owned by: awulkiew
Milestone: Boost 1.64.0 Component: geometry
Version: Boost 1.63.0 Severity: Problem
Keywords: rtree Cc:

Description

When creating an rtree with packing algorithm I get a segmentation fault. This error is extremly rare and only appears with 4.8.2 (no older compiler tested) and not on all computers.

Boost versions tested: 1.61.0, 1.63.0 Tested compiler where error occurs: gcc-4.8.2

Tested compiler where error doesn't occur: gcc-4.9.3

Compiler options tested (always fails for gcc-4.8.2): "-std=c++11 -O0 -ggdb" "-std=c++11 -O3"

Rtree-configurations tested:

  • Do not use packing, but insert elements (works)
  • Use quadratic/linear instead of rstar (fails always)
  • Use higher numbers for MaxElements? (works for the given example but can also fail if the rtree elements are changed, especially if the element number increases)
  • Instead of using std::pair as rtree-elements I tried std::tuple and boost::tuple. The error occurs for all elements.

Additional notes: This error occured for me first when I used rstar with MaxElements? = 16 where the element number was about 2 Mio. In this case the error is extremly sensible: e.g. Using one element more or less makes the error disappear. Also changing an element can make the error disappear. By changing the parameters I was able to reduce the problem size. For the given case the minimum number of elements I can use without error is 30684. For higher number the error occurs. However, using a complete other set of elements the error can appear at completely different element numbers.

Attachments

CMakeLists.txt (301 bytes) - added by michael.moessner@… 2 months ago.
CMake file to compile the code
test_tree.cpp (2.0 KB) - added by michael.moessner@… 2 months ago.
Example code where error occurs
valgrindout.txt (131.5 KB) - added by michael.moessner@… 2 months ago.
Valgrind output
testfile.dat.tar.gz.part1 (195.3 KB) - added by michael.moessner@… 2 months ago.
First part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)
testfile.dat.tar.gz.part2 (157.0 KB) - added by michael.moessner@… 2 months ago.
Second part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)
valgrind-error.txt (153.2 KB) - added by michael.moessner@… 2 months ago.
valgrind output on other system and with track-origins active
gdb-backtrace (9.7 KB) - added by michael.moessner@… 2 months ago.
gdb backtrace on new system
gdb-walkthrough (50.3 KB) - added by michael.moessner@… 2 months ago.
Insights into some variable given by gdb on new system

Change History

Changed 2 months ago by michael.moessner@…

CMake file to compile the code

Changed 2 months ago by michael.moessner@…

Example code where error occurs

Changed 2 months ago by michael.moessner@…

Valgrind output

Changed 2 months ago by michael.moessner@…

First part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)

Changed 2 months ago by michael.moessner@…

Second part of testfile needed for testing the code (use cat testfile.dat.tar.gz.part* > testfile.dat.tar.gz to join the files again)

comment:1 Changed 2 months ago by barendgehrels

  • Owner changed from barendgehrels to awulkiew

comment:2 Changed 2 months ago by awulkiew

Thanks for the extensive report!

I tested code/data with with gcc-4.8.2-19ubuntu1 (Ubuntu 14.04) and don't get the same result as you've got. I get an assertion failure regarding the validity of the boxes (when min > max) passed into the rtree however after compiling the code with -DNDEBUG (disabling assertions) I don't get any segfault (though the rtree probably contains garbage), valgrind (--tool=memcheck) also doesn't complain. Both with -O0 and -O3. I also tried all possible numbers of boxes between 30685 and 40000 (max) in a loop with -O3. And the same with valgrind but this time with a step 100 between the numbers of boxes.

The funny thing is that I don't get the assertion failure with gcc-4.8.5-2ubuntu1 (Linux Mint). It seems like the data is being serialized differently somehow. To be sure I put a check into the loop filling bounding_boxes vector:

    if(! bgi::detail::is_valid(box))
    {
        std::cout << bg::dsv(box) << std::endl;
        invalid_count++;
    }

and detected a lot of invalid boxes.

Could you check if that's happening on your machine as well?

Have you tried to run the program on a different machine?

EDIT: I managed to test the data with gcc-4.8.2 by getting rid of Serialization by generating textfile containing WKTs of boxes on a different machine and then loading them from textfile instead of binaryfile. The program works without segaults.

Last edited 2 months ago by awulkiew (previous) (diff)

comment:3 Changed 2 months ago by michael.moessner@…

Thank you for the detailed and quick response.

First of all, sorry for the trouble regarding the serialization. Due to the file size limitation this appeared to be the simplest solution and I hoped it would be portable enough.

I compiled gcc4.8.2 on another computer and was able to reproduce the error. Unfortunately, the underlying operating systems are similar (SUSE Linux Enterprise Desktop 11). The system setups basically differ in hardware and that on one system I used the preconfigured gcc4.8.2 and on the other system I compiled it myself.

I now also tried gcc4.8.1 and 5.2.0 on the new system which worked fine both.

Adding the valid-box check always passes without any complain.

I wonder if I should just switch my default compiler and assume that the error is due to a erroneous system setting?

For completeness I will attach the gdb backtrace, valgrind output with track-origins active, and a gdb walkthrough showing some variable values. Apparently one can see some "nan"s here (the input should be ok nonetheless since using "insert" instead of packing works fine). Note, that the outputs are different than in the files before since I created the on the other system.

Changed 2 months ago by michael.moessner@…

valgrind output on other system and with track-origins active

Changed 2 months ago by michael.moessner@…

gdb backtrace on new system

Changed 2 months ago by michael.moessner@…

Insights into some variable given by gdb on new system

comment:4 Changed 2 months ago by awulkiew

AFAIU these NaNs? are inside not yet initialized m_box inside expandable_box:

https://svn.boost.org/trac/boost/attachment/ticket/12861/gdb-walkthrough#L106

created on stack:

https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp#L273

per_level function is called recursively. Is it possible that the stack size is too small? I'm also concerned about the <error reading variable>:

https://svn.boost.org/trac/boost/attachment/ticket/12861/gdb-walkthrough#L105

Is it possible that this is caused by optimization enabled? I can see that there are some <optimized out> in the GDB output. If that's the case could you generate similar file with optimization disabled?

Or try to debug it in file pack_create.hpp around line 273, e.g. see if it always fails for the same data and then try to catch it. Maybe this compiler has something against this expression:

translator(*(first->second))

where first->second should be an iterator of Range you passed into the rtree ctor (it takes Range const&, so the iterator type should be std::vector<rtree_elem>::const_iterator), and translator's operator() should take const reference to Value type (std::pair<boost_box, int> const&) and return const reference to boost_box const&.

If you don't want to play with it, then I'll find some time in the near future to install SUSE ED 11 SP4 and gcc-4.8.2, and maybe I'll be able to reproduce it.

comment:5 Changed 2 months ago by anonymous

I tried setting the stack size to unlimited but the behaviour stays the same. I guess this would also be compiler independent?

It is not clear to me why there are <optimized out> variables since I compiled with -O0 and -ggdb. And I was not able to figure out yet where this comes from.

I tried investigating the error further and was able to track it down to the code section

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Tp, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
			  _RandomAccessIterator __last,
			  const _Tp& __pivot, _Compare __comp)
    {
      while (true)
	{
	  while (__comp(*__first, __pivot))
	    ++__first;
	  --__last;
	  while (__comp(__pivot, *__last))
	    --__last;
	  if (!(__first < __last))
	    return __first;
	  std::iter_swap(__first, __last);
	  ++__first;
	}
    }

in std_algo.h. The Loop

while (__comp(*__first, __pivot))
	    ++__first;

goes on and on until __first becomes invalid. (It seems to go on for an actually large number over 6000 elements).

If there are further investigations I can do, please let me know.

comment:6 Changed 2 months ago by awulkiew

I was able to reproduce it. It is a bug in libstdc++ in std::nth_element() function, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800

For your data __unguarded_partition goes out of bounds for a case similar to this (values below are z coordinates of centroids of some boxes):

std::vector<float> vec = {3.94167948, 2.87522459, 4.17142391, 3.00844622};
std::nth_element(vec.begin(), vec.begin() + 2, vec.end());

It seems that std::sort doesn't have this problem so it could be called instead for specific version of gcc/libstdc++. So instead of a SEGFAULT the rtree would be slower. If I figured out how reliable is the timestamp defined by __GLIBCXX__ in this case I could conditionally call std::sort. At the GCC bug page they wrote that several GCC versions were affected (Fixed for 4.7.4/4.8.3/4.9.0.) and the timestamp may be set to different value for different compilers, so IDK. At the moment I could enable it for this specific version I've used, __GLIBCXX__ is 20131016. Is it the same for you? Do you have any suggestions?

As a workaround you can either update gcc/libstdc++ or alter your local copy of Boost by replacing std::nth_element calls with std::sort calls. Just use grep to find them in boost/geometry. There are some calls in the rstar balancing algorithm and one call in packing algorithm, here: https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp#L70 which could be replaced with:

std::sort(first, last, point_entries_comparer<I>());

EDIT: You mentioned earlier that it works fine with gcc 4.8.1. Are you sure about it? According to this bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59116 this version is also affected. And the timestamp for 4.8.1 is later than for 4.7.3 (https://gcc.gnu.org/develop.html#timeline) which AFAIU is affected as well.

EDIT2: I checked the code of several versions of libstdc++ (shipped with gcc 4.7.2-4, gcc 4.8.0-3 and gcc 4.9.0-1) and it seems that the only version containing the bug is gcc 4.8.2.

Last edited 2 months ago by awulkiew (previous) (diff)

comment:7 Changed 2 months ago by awulkiew

  • Status changed from new to closed
  • Resolution set to fixed
  • Milestone changed from To Be Determined to Boost 1.64.0

comment:8 Changed 2 months ago by anonymous

Thank you very much for investigating the error so quickly and tracking down the actual cause. I'm really glad that I can use the rtree now without any concerns of failing because of this unpredicted behaviour.

You guessed right about my specific GCC version to be 20131016.

For my part I will just switch to another gcc to solve this problem.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.