Opened 5 years ago

Closed 5 years ago

#9662 closed Bugs (fixed)

High CPU usage for steady_clock waitable timer on Windows (integer overflow problem)

Reported by: wyatt@… Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.55.0 Severity: Problem
Keywords: asio, SetWaitableTimer, basic_waitable_timer, steady_clock Cc:

Description

Hey Guys & Girls,

I've tracked down a bug in your waitable timer implementation on Windows. Specifically the update_timeout() function in "boost/asio/detail/impl/win_iocp_io_service.ipp". More specifically the SetWaitableTimer?() call in that function for certain "timeout_usec" values.

Specifically, for certain large "wait values" the current implementation thrashes the CPU using up nearly all the CPU time on a "core" of the processor. Obviously that's not ideal.

I haven't tracked down the exact place where things go wrong, nor have I written a patch. If I get some free time in a month or 2 I might do it if no one else wants to pick it up.

Here's how you can reproduce it. (If you want I can put together a full example for you)

typedef boost::asio::basic_waitable_timer<boost::chrono::steady_clock> monotonic_timer;



monotonic_timer buggy_timer_(io_service_);

buggy_timer_.expires_from_now(boost::chrono::seconds(86400));
buggy_timer_.async_wait(boost::bind(&server::callback_function, this, boost::asio::placeholders::error));

io_service_.run()

The problem lies with this:

boost::chrono::seconds(86400)

If you use a value half as large (43200 seconds) then the program uses normal CPU levels (i.e. around 0% CPU). Obviously we've got an integer overflow problem and faulty logic. As I said, I believe it's in the update_timeout() function in "boost/asio/detail/impl/win_iocp_io_service.ipp"

If you need more information just ask and I'll be glad to provide it.

Attachments (1)

BoostMonotonicBug.zip (3.8 KB) - added by wyatt@… 5 years ago.
Simple example that reproduces this problem.

Download all attachments as: .zip

Change History (3)

Changed 5 years ago by wyatt@…

Attachment: BoostMonotonicBug.zip added

Simple example that reproduces this problem.

comment:1 Changed 5 years ago by wyatt@…

Severity: ProblemShowstopper

I've put together a simple example (see attached BoostMonotonicBug.zip). Just compile it & run it and you'll see the 100% CPU bug on a CPU core. This is a very serious bug. Anyone that accepts user-defined lengths of time that are then used in the basic_waitable_timer<steady_clock> will be affected by this 100% CPU bug.

I'm going to dig into this bug further. But here's what I know so far:

  1. Small values like 30 seconds show no problem whatsoever. Everything just works with around 0% CPU.
  1. Some large values like 86400 seconds show 100% CPU usage immediately (whether it ever drops back down to 0% I don't know -- it certainly didn't with the hours of testing I've done so far).
  1. Some other large value like 43200 seconds show 0% CPU usage for a short time (around 2 hours), and then the CPU usage shoots up to 100%.

Here's what I suspect, but haven't yet confirmed:

  1. This smells like an integer overflow or integer comparison bug.
  1. This seems like a Windows-only bug (I ran the example on Linux & Mac OS X and it was 0% CPU usage for the hour or so I ran the test).

I have quite a bit of C++ experience, but not so much with the boost internals. If someone with experience with this particular implementation would spend an hour or 2 chatting with me we could solve this post-haste.

I can certainly step through the boost implementation code line-by-line, but the process would be a good deal faster if I could chat with someone who knows this code already.

At any rate, I'll continue to chip away at this problem.

comment:2 Changed 5 years ago by chris_kohlhoff

Resolution: fixed
Severity: ShowstopperProblem
Status: newclosed
Note: See TracTickets for help on using tickets.