Modify

Ticket #9079 (closed Bugs: fixed)

Opened 8 months ago

Last modified 7 months ago

Condition variable will wait forever for some timepoint values (MSVC)

Reported by: mjklaim@… Owned by: viboes
Milestone: Boost 1.55.0 Component: thread
Version: Boost 1.54.0 Severity: Problem
Keywords: Cc: s.heil@…

Description

Original discussion, including test cases and a quick fix: boost.2283326.n4.nabble.com/thread-Can-boost-this-thread-sleep-for-totally-stop-the-thread-td4650070.html

Summary: In some conditions, reproduced in the following repro cases, boost::condition_variable::wait_until/wait_for will wait forever.


I'm using Boost 1.54, Visual Studio 2012 Update 3.

Yesterday night I decided to try to build a repro case and now I think I see at least one of the problems but I need confirmation. I managed to build 2 repro cases at the end of this email, one very simple (A) and one a bit more complex and closer to what I actually do (B).

Unfortunately, I am not sure if the problem in A, that I can understand, is really exactly the same as the problem in B.

In A, the thread stops immediately, without even hitting the "TICK" log. Apparently the problem, even with the initial case which used this_thread::sleep_until, is related to condition_variable::wait_until. Basically, if the time given is 1 millisecond past the "now" timepoint used into the wait_until function, the thread will wait forever.

Here is the implementation of wait_until in Boost 1.54:

template <class Clock, class Duration> 
        cv_status 
        wait_until( 
                unique_lock<mutex>& lock, 
                const chrono::time_point<Clock, Duration>& t) 
        { 
          using namespace chrono; 
          do_wait(lock, ceil<milliseconds>(t-Clock::now()).count()); 
          return Clock::now() < t ? cv_status::no_timeout : 
                                             cv_status::timeout; 
        } 

My current hypothesis is that the ceil() call in this particular case doesn't do what is intended. Here is the implementation of ceil():

/** 
     * rounds up 
     */ 
    template <class To, class Rep, class Period> 
    To ceil(const duration<Rep, Period>& d) 
    { 
        To t = duration_cast<To>(d); 
        if (t < d) 
            ++t; 
        return t; 
    } 

Assuming the call to Clock::now() in wait_until() gives exactly t + 1ms, then ceil, if I'm correct will return 0. The call do_wait( 0 ) seems to wait forever, but I don't understand the details of the implementation. The same exact problem occurs with wait_for() (by the way, why does it always have to use steady_clock? see the implementation)

Suggestion: add a check so that do_wait() returns immediately if the given time is <= 0. I'm assuming that the time given to the wait_until() call is always based on int-kind of value... if that's not the case, it might be harder to get right.

However, in B, it is a bit more complex: the endless wait don't occur immediately but I managed to make it occur very quickly in my current setup at least. It seems that the problem is similar, but I couldnt' reproduce the behaviour of B in A by modifying the time to wait until, only (now - 1ms) seems to work.

In B, we first acquire the current time point, then do some work, then call wait_until(). I believe it is certainly a variation of the bug in A but this is what happen in my application. It takes much more time to trigger the endless wait in my application but you can tweak the values to get the same result, for example by setting MIN_TIME_TASKS_SPAWN_FREQUENCY to 125 milliseconds (it's the value in my app).

Basically, the problem seems to be that I introduce delay by working between the acquisition of the timepoint and the wait_until() call, which makes possible to have a duration value calculated into wait_until which would be ceild to 0, which would trigger the endless wait.

First, please tell me if you can reproduce the problem. Second, I'll try to see if adding the (duration <= 0) check in do_wait() does fix the problem for me. I can reproduce it easily now so I should be able to figure it out today.


// A 

#include <boost/log/trivial.hpp> 
#include <boost/chrono.hpp> 
#include <boost/thread.hpp> 
#include <boost/thread/condition_variable.hpp> 

typedef boost::chrono::high_resolution_clock Clock; 
typedef Clock::time_point TimePoint; 

inline TimePoint real_time_now() { return Clock::now(); } 

int main() 
{ 
using namespace boost::chrono; 

boost::condition_variable m_task_spawn_condition; 

boost::mutex main_thread_mutex; 
boost::unique_lock<boost::mutex> main_thread_lock( main_thread_mutex ); 

BOOST_LOG_TRIVIAL(info) << "[TaskScheduler::run_and_wait] Scheduling loop - 
BEGIN"; 

while( true ) 
{ 
static const auto TIME_BACK = milliseconds(1); 
m_task_spawn_condition.wait_until( main_thread_lock, real_time_now() - 
TIME_BACK ); // wait forever 
//m_task_spawn_condition.wait_for( main_thread_lock, - TIME_BACK ); // same 
problem 
BOOST_LOG_TRIVIAL(trace) << "TICK"; 
} 

} 


// B 

#include <atomic> 
#include <boost/log/trivial.hpp> 
#include <boost/chrono.hpp> 
#include <boost/thread.hpp> 
#include <boost/thread/condition_variable.hpp> 

typedef boost::chrono::high_resolution_clock Clock; 
typedef Clock::time_point TimePoint; 

inline TimePoint real_time_now() { return Clock::now(); } 


class Foo 
{ 
std::atomic<bool> m_is_exiting; 
TimePoint m_next_tick_time; 

public: 

bool is_exiting() const { return m_is_exiting; } 

TimePoint spawn_tasks() // note that in my app, this call takes more time 
than here 
{ 
using namespace boost::chrono; 
const auto now = real_time_now(); 

if( m_next_tick_time < now ) 
{ 
m_next_tick_time = now + seconds(1); 
BOOST_LOG_TRIVIAL(info) << "TICK!"; 
} 

return m_next_tick_time; 
} 

}; 


int main() 
{ 
using namespace boost::chrono; 
static const auto MIN_TIME_TASKS_SPAWN_FREQUENCY = milliseconds(1); 
//microseconds(1); // THE SHORTER THE QUICKER TO REPRODUCE THE BUG 

boost::condition_variable m_task_spawn_condition; 
Foo foo; 

boost::mutex main_thread_mutex; 
boost::unique_lock<boost::mutex> main_thread_lock( main_thread_mutex ); 

BOOST_LOG_TRIVIAL(info) << "[TaskScheduler::run_and_wait] Scheduling loop - 
BEGIN"; 

while( !foo.is_exiting() ) 
{ 
const auto next_task_spawn_time = foo.spawn_tasks(); 

const auto now = real_time_now(); 
const auto next_minimum_spawn_time = now + MIN_TIME_TASKS_SPAWN_FREQUENCY; 
 const auto next_spawn_time = 
next_task_spawn_time > TimePoint() && next_task_spawn_time < 
next_minimum_spawn_time ? 
next_task_spawn_time : next_minimum_spawn_time; 

const auto wait_time = next_spawn_time - now; 
if( wait_time > wait_time.zero() ) 
{ 
BOOST_LOG_TRIVIAL(trace) << "WAIT TIME: " << wait_time; // UNCOMMENT THIS: 
MAKES IT WORKS. WAT?????? 
auto reason = m_task_spawn_condition.wait_until( main_thread_lock, 
next_spawn_time ); // DON'T WORK: WILL WAIT IF next_spawn_time is too close! 
} 

} 


} 


Diving into the code, I just realize the real source of the problem:

 struct BOOST_SYMBOL_VISIBLE timeout
        {
            win32::ticks_type start;
            uintmax_t milliseconds;  // HERE
            bool relative;
            boost::system_time abs_time;

            static unsigned long const max_non_infinite_wait=0xfffffffe;

            timeout(uintmax_t milliseconds_): // HERE 

do_wait() takses a timeout which will use a unsined int as milliseconds, which will make any negative number very big. Assuming ceil() returns a negative number (I might have made a mistake in my precedent mail), then we get a very big timeout....


I tried replacing uintmax_t by long long in this quoted code, as a quick-fix, and it seems to work in my repro-cases.

...

It fixes my application too.

Attachments

Change History

comment:1 Changed 8 months ago by anonymous

The CMake config:

cmake_minimum_required( VERSION 2.8.11 )

if( WIN32 AND CMAKE_SIZEOF_VOID_P EQUAL 8 ) 
		# On Windows 64bit: check that the right boost binaries are set before continuing
		if( NOT DEFINED BOOST_LIBRARYDIR OR BOOST_LIBRARYDIR STREQUAL "BOOST_LIBRARYDIR-NOT-SET" )
			set( BOOST_LIBRARYDIR "BOOST_LIBRARYDIR-NOT-SET" CACHE PATH "Location of the Boost library binaries" FORCE )
			message( FATAL_ERROR "BOOST_LIBRARYDIR is not set (we need the 64bit version). Before continuing, please set it to the correct binary path (depending on if you want to link with 32 or 64bit version)." )
		endif()
		
	endif()
	
# Force Boost.Thread V4 use for all projects, including dependencies.
add_definitions( -DBOOST_THREAD_VERSION=4 -DBOOST_RESULT_OF_USE_DECLTYPE )

set( Boost_USE_STATIC_LIBS        ON )
set( Boost_USE_MULTITHREADED      ON )
set( Boost_USE_STATIC_RUNTIME    OFF )
find_package( Boost 1.54.0 REQUIRED COMPONENTS 
	system 
	date_time 
	filesystem 
	chrono
	thread
	program_options
	log
	log_setup
	iostreams
)

if( NOT Boost_FOUND )
	message( SEND_ERROR "Boost libraries NOT FOUND!" )
endif()

include_directories( ${Boost_INCLUDE_DIR} )

add_executable( test main.cpp )
target_link_libraries( test ${Boost_LIBRARIES} )

I tested only in 32bit mode.

comment:2 Changed 8 months ago by viboes

  • Owner changed from anthonyw to viboes
  • Status changed from new to assigned

Thanks for this long and useful report.

The examples seems realistic. I will try to fix it soon.

comment:3 Changed 8 months ago by s.heil@…

  • Cc s.heil@… added

comment:4 Changed 7 months ago by viboes

  • Summary changed from Condition variable will wait forever for some timepoint values. to Condition variable will wait forever for some timepoint values (MSVC)

comment:5 Changed 7 months ago by viboes

  • Milestone changed from To Be Determined to Boost 1.55.0

Please, could you try after updating with the following

Changeset [85591],[85592]

Thread: protect condition_variable/_any wait_for and wait_until from …

comment:6 Changed 7 months ago by mjklaim@…

  • I copy/pasted the boost/thread directory from trunk into my boost/thread directory, overwriting any change but keeping Boost.1.54 for all the other libraries;
  • all my isolated tests runs apparently correctly;
  • my application's tests runs apparently correctly;
  • my application runs apparently correctly (previously it freezed after some time);

So I think the fix works.

comment:7 Changed 7 months ago by viboes

  • Status changed from assigned to closed
  • Resolution set to fixed

Committed revision [85663].

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.