Modify

Ticket #7961 (closed Bugs: fixed)

Opened 15 months ago

Last modified 11 months ago

handle_connect called before the socket was actually connected

Reported by: Nikki Chumakov <nikkikom@…> Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.52.0 Severity: Problem
Keywords: asio epoll epoll_reactor Cc:

Description

RedHat5.8, kernel 2.6.32.26-17.el5, glibc 2.5-81.el5_8.7 The problem was detected with boost 1.49, but was confirmed with 1.53b1 also.

Problem: handle_connect (connect completion handler) can be called before TCP open handshake completes.

Unfortunately I could not strip my application to reasonable size, so I prefer not to post it. There is mail thread related to this bug at

I believe there is a bug in epoll_reactor, the way it handles EPOLLHUP event on yet-not-connected sockets. Below I explain the details and symptoms.

Here is strace output of such connects:

[pid 25441] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)      = 9

[pid 25441] epoll_ctl(5, EPOLL_CTL_ADD, 9, 
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1811961744, 
u64=140515962017680}} <unfinished ...>
[pid 25442] epoll_wait(5,  <unfinished ...>
[pid 25441] <... epoll_ctl resumed> )   = 0
[pid 25442] <... epoll_wait resumed> {{EPOLLOUT, {u32=1811958752, 
u64=140515962014688}}, {EPOLLIN, {u32=1811943784, u64=140515961999720}}, 
{EPOLLOUT|EPOLLHUP, {u32=1811961744, u64=140515962017680}}}, 128, 0) = 3

[pid 25441] ioctl(9, FIONBIO, [1])  = 0

[pid 25441] connect(9, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("xxx.xxx.193.11")}, 16)     = -1 EINPROGRESS 
(Operation now in progress)
*********** no epoll_wait after connect **********

[pid 25441] epoll_ctl(5, EPOLL_CTL_MOD, 9, 
{EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1811961744, 
u64=140515962017680}})   = 0

*********** calling handle_connect
[pid 25442] getsockopt(9, SOL_SOCKET, SO_ERROR, [7782250667543887872], 
[4]) = 0
[pid 25442] getpeername(9, 0x40450360, [140514150055964]) = -1 ENOTCONN 
(Transport endpoint is not connected)
[pid 25442] write(2, "connect error: ", 15connect error: ) = 15
[pid 25442] write(2, "Transport endpoint is not connec"..., 35Transport 
endpoint is not connected) = 35
[pid 25442] write(2, "\n", 1
)           = 1

As one can see, there is no epoll_wait after ::connect call, but connect handler was called.

So, asio calls "::connect" and then immediately calls user handle_connect handler without calling (and waiting for) epoll_wait between ::connect and handle_connect. Thus handle_connect is called before the socket was actually connected.

What may happen is:

  1. main thread calls do_open and adds the socket to epoll queue.
  2. service thread calls epoll_wait and it returns several events INCLUDING that socket.
  3. main thread calls async_connect (and modifies the socket in epoll queue, but it does not matter at this point)
  4. service thread processes the events it got form epoll_wait at step #2 in a loop, and when it process that socket, the completion connect handler is called.

The possible workaround is to ignore EPOLLHUP in epoll_reactor::descriptor_state::perform_io() until the socket got 'connected' state.

I'm attaching the patch that works for me. It need to be carefully reviewed, because of possible unwanted side effects (e.g. lost socket disconnect error notifications).

Attachments

boost-1.53.0b1-asio-epoll-reactor.patch Download (1.7 KB) - added by Nikki Chumakov <nikkikom@…> 15 months ago.
Suggested patch for boost-1.53b1

Change History

Changed 15 months ago by Nikki Chumakov <nikkikom@…>

Suggested patch for boost-1.53b1

comment:1 Changed 14 months ago by anonymous

We're seeing this happen with Boost 1.50.0 in our newly-ported-to-Linux code, and it's having some other effects as well.

When the erroneous connect occurs, we start sending data and getting reads for the same data we're sending out. It looks like the send and receive buffers are getting confused, although it'll be a little while before we have time to dig in and created a minimal test case.

comment:2 Changed 11 months ago by chris_kohlhoff

Fixed on trunk in [84349]. Merged to release in [84388].

comment:3 Changed 11 months ago by chris_kohlhoff

  • Status changed from new to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.