Opened 10 years ago

Last modified 6 years ago

#2400 reopened Feature Requests

Messages corrupted if isend requests are not retained

Reported by: dwchiang@… Owned by: Matthias Troyer
Milestone: Boost 1.37.0 Component: mpi
Version: Boost 1.36.0 Severity: Problem
Keywords: Cc:


If I do a series of isends and discard the request objects (because I don't need to know when they complete), the messages can get corrupted. I realize this is the way the MPI library is designed, but I'm wondering if it possible to use some C++ goodness to make the request objects persist behind-the-scenes until the request is completed? The behavior is particularly unexpected in the Python layer. Thanks very much!

Change History (8)

comment:1 Changed 10 years ago by Marshall Clow

Component: Nonempi
Owner: set to Douglas Gregor

comment:2 Changed 9 years ago by Matthias Troyer

Owner: changed from Douglas Gregor to Matthias Troyer
Status: newassigned

comment:3 Changed 9 years ago by Matthias Troyer

The reason is that when sending anything other than MPI datatypes, the object is serialized by Boost.Serialization into a temporary buffer, which is then associated with the request object. Using isend with Boost.MPI one always has to wait for completion of the request and cannot discard it.

comment:4 Changed 9 years ago by Matthias Troyer

Resolution: invalid
Status: assignedclosed

comment:5 Changed 9 years ago by Matthias Troyer

This is not a bug but intended behavior. A note has been added to the documentation.

comment:6 Changed 9 years ago by anonymous

OK, I understand, but in Python one normally does not have to worry about when it is safe to deallocate an object, as the GC is supposed to take care of this for you. Is there no way to increment the Python reference count of the request object or its buffer until it is completed?

comment:7 Changed 9 years ago by Matthias Troyer

No, the basic issue is that: you *have to* call wait on the request to finish the irecv or isend operation and cannot just discard the object. If you do not call wait, the code never checks for completion. There are a few solutions, that are sub-optimal:

1.) keep the buffer alive as you propose, but that will lead to a memory leak since the request obejct is the only one who knows about the buffer. If you discard the request there will be a leak since nobody ever checks for completion.

2.) have the destructor call wait - but then the code might just hang or deadlock in the destructor, which would be very hard to debug.

3.) have the destructor cancel the request, but then again this leads to unexpected behavior, if one discards the request: this automatically cancels the send!

The best option as I can see is to:

4.) assert on the precondition to the destructor, namely that the request has to be finished. Note that we cannot throw an exception in a destructor have to abort.

I don't like option 4 either but it seems the safest and will at least tell you that you forgot to wait for completion.

comment:8 Changed 6 years ago by Matthias Troyer

Resolution: invalid
Status: closedreopened

We should revisit this ticket since MPI 3 now explicitly allows requests to be discarded and we should try to mimic this. This will require us to keep buffers alive and perform garbage collection from time to time when requests are done.

Note: See TracTickets for help on using tickets.