Modify

Ticket #2400 (reopened Feature Requests)

Opened 6 years ago

Last modified 16 months ago

Messages corrupted if isend requests are not retained

Reported by: dwchiang@… Owned by: troyer
Milestone: Boost 1.37.0 Component: mpi
Version: Boost 1.36.0 Severity: Problem
Keywords: Cc:

Description

If I do a series of isends and discard the request objects (because I don't need to know when they complete), the messages can get corrupted. I realize this is the way the MPI library is designed, but I'm wondering if it possible to use some C++ goodness to make the request objects persist behind-the-scenes until the request is completed? The behavior is particularly unexpected in the Python layer. Thanks very much!

Attachments

Change History

comment:1 Changed 6 years ago by marshall

  • Owner set to dgregor
  • Component changed from None to mpi

comment:2 Changed 5 years ago by troyer

  • Owner changed from dgregor to troyer
  • Status changed from new to assigned

comment:3 Changed 5 years ago by troyer

The reason is that when sending anything other than MPI datatypes, the object is serialized by Boost.Serialization into a temporary buffer, which is then associated with the request object. Using isend with Boost.MPI one always has to wait for completion of the request and cannot discard it.

comment:4 Changed 5 years ago by troyer

  • Status changed from assigned to closed
  • Resolution set to invalid

comment:5 Changed 5 years ago by troyer

This is not a bug but intended behavior. A note has been added to the documentation.

comment:6 Changed 5 years ago by anonymous

OK, I understand, but in Python one normally does not have to worry about when it is safe to deallocate an object, as the GC is supposed to take care of this for you. Is there no way to increment the Python reference count of the request object or its buffer until it is completed?

comment:7 Changed 5 years ago by troyer

No, the basic issue is that: you *have to* call wait on the request to finish the irecv or isend operation and cannot just discard the object. If you do not call wait, the code never checks for completion. There are a few solutions, that are sub-optimal:

1.) keep the buffer alive as you propose, but that will lead to a memory leak since the request obejct is the only one who knows about the buffer. If you discard the request there will be a leak since nobody ever checks for completion.

2.) have the destructor call wait - but then the code might just hang or deadlock in the destructor, which would be very hard to debug.

3.) have the destructor cancel the request, but then again this leads to unexpected behavior, if one discards the request: this automatically cancels the send!

The best option as I can see is to:

4.) assert on the precondition to the destructor, namely that the request has to be finished. Note that we cannot throw an exception in a destructor have to abort.

I don't like option 4 either but it seems the safest and will at least tell you that you forgot to wait for completion.

comment:8 Changed 16 months ago by troyer

  • Status changed from closed to reopened
  • Resolution invalid deleted

We should revisit this ticket since MPI 3 now explicitly allows requests to be discarded and we should try to mimic this. This will require us to keep buffers alive and perform garbage collection from time to time when requests are done.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as reopened
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.