Modify

Opened 17 months ago

Closed 16 months ago

Last modified 15 months ago

#12215 closed Bugs (fixed)

Boost.context: call stack corrupted on Windows using default fixedsize_stack

Reported by: runningwithscythes@… Owned by: olli
Milestone: To Be Determined Component: context
Version: Boost 1.61.0 Severity: Problem
Keywords: Cc:

Description

There is an issue in basic_fixedsize_stack since at least Boost 1.59 on Windows using MSVC2013 or MSVC2015 in debug builds only, causing wired crashes of seemingly totally unrelated Windows API calls and the like. The following simple unit test fails on any Windows machine I tested so far:

#define BOOST_COROUTINES_UNIDRECT
#define BOOST_COROUTINES_V2
#include <boost/coroutine2/coroutine.hpp>
// ...

using coro_t = boost::coroutines2::coroutine<int>;

BOOST_AUTO_TEST_CASE(test_windows_boost_bug)
{
  bool result = false;

  auto coro_function = [&](coro_t::push_type& sink) {
#if defined(PLATFORM_WINDOWS)
    char buffer[MAX_PATH];
    // The following simple Windows API call crashes when using MSVC
    // on Windows in debug build only.
    GetModuleFileName(nullptr, buffer, MAX_PATH);
    // Exception thrown at 0x00007FF939A21D58 (ntdll.dll) in
    // test.shift.task.x86_64.vc140.exe: 0xC0000005:
    // Access violation reading location 0xFFFFFFFFFFFFFFFF.

    result = true; // code not reached.
#endif
  };

  coro_t::pull_type{coro_function};
  BOOST_CHECK(result);
}

I stumbled across this bug several times but didn't try to fix it until I realized that it is still present in the recently released Boost 1.61.

Once the code crashes the full stack trace looks like this:

ntdll.dll!LdrGetDllFullName	Unknown
KernelBase.dll!GetModuleFileNameW	Unknown
KernelBase.dll!GetModuleFileNameA	Unknown
>	test.shift.task.x86_64.vc140.exe!test_windows_boost_bug::test_method::__l2::<lambda>	C++
test.shift.task.x86_64.vc140.exe!boost::coroutines2::detail::pull_coroutine<int>::control_block::<lambda>	C++
test.shift.task.x86_64.vc140.exe!std::_Invoker_functor::_Call<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64>	C++
test.shift.task.x86_64.vc140.exe!std::invoke<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),boost::context::execution_context<int * __ptr64>,int * __ptr64>	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::apply_impl<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64>,0,1>	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::apply<boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *),std::tuple<boost::context::execution_context<int * __ptr64> && __ptr64,int * __ptr64> >	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) >::run	C++
test.shift.task.x86_64.vc140.exe!boost::context::detail::context_entry<boost::context::detail::record<boost::context::execution_context<int * __ptr64>,boost::context::basic_fixedsize_stack<boost::context::stack_traits>,boost::context::execution_context<int *> <lambda>(boost::context::execution_context<int *>, int *) > >	C++
test.shift.task.x86_64.vc140.exe!make_fcontext	Unknown
0000015ef8773e60	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
00000018dad1d500	Unknown
0000015ef8773e80	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
0000000000010000	Unknown
0000000000010000	Unknown
0000015ef8773f20	Unknown
0000015ef8773ec0	Unknown
00000018dad1d984	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown
cdcdcdcdcdcdcdcd	Unknown

It took me a while to figure out what went wrong with the call stack as I initially thought about a bug in the context switching code. However, the solution turned out to be rather simple: The stack memory allocated using the basic_fixedsize_stack class simply isn't initialized. A simple call to memset fully resolves the issue for me.

Attachments (1)

boost_1_61_0-context-init-stack.patch (586 bytes) - added by runningwithscythes@… 17 months ago.
patch to initialize stack memory

Download all attachments as: .zip

Change History (7)

Changed 17 months ago by runningwithscythes@…

patch to initialize stack memory

comment:1 Changed 16 months ago by olli

Resolution: fixed
Status: newclosed

thx, fixed

comment:2 in reply to:  1 ; Changed 16 months ago by Alan Wilkie <alan@…>

Replying to olli:

thx, fixed

Just to round this out, I have been chasing the same (or very similar) issue and I think the root cause is the "fbr_strg" entry in the context is not being specifically initialised. When the initial context switch occurs, it picks up the unitialised value and writes it to the TIB (especially in debug builds where new memory is intialised to 0xCD). Some Windows functions consult this value and use it if it's not zero.

Initialising the allocated stack space also zeroes the context and fixes the problem. I think it should also be possible to fix by setting fbr_strg to zero in make_x86_64_ms_pe_masm.asm and make_i386_ms_pe_masm.asm.

comment:3 in reply to:  2 ; Changed 16 months ago by olli

makes sense - I've changed the code in branch develop. could you verify the fix, please

comment:4 in reply to:  3 Changed 16 months ago by Alan Wilkie <alan@…>

Replying to olli:

makes sense - I've changed the code in branch develop. could you verify the fix, please

I haven't verified the actual code of the develop branch, but I've made the same change to the 1.60 code and it does fix the crash. Looking at the commit, I assume that corresponding changes would need to be made in make_x86_64_ms_pe_gas.asm and make_i386_ms_pe_masm.asm?

comment:5 Changed 15 months ago by baldzar@…

I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).

The fix is the same, zeroing the stack.

comment:6 in reply to:  5 Changed 15 months ago by olli

Replying to baldzar@…:

I am experience the same issue using coroutine/context via asio. Actually the default stack allocator used there is basic_standard_stack_allocator (boost/coroutine/standard_stack_allocator.hpp).

The fix is the same, zeroing the stack.

But the problem seams to be related to the fiber-storge field in the TIB. The fix in 1.62 does initialize this field with zeros. Could you verify that this fixes the problem, please?

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain olli.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.