Modify

Ticket #4010 (closed Bugs: fixed)

Opened 4 years ago

Last modified 3 years ago

Boost message queue bug

Reported by: rusty0831 <rusty_lai@…> Owned by: igaztanaga
Milestone: Boost 1.45.0 Component: interprocess
Version: Boost 1.42.0 Severity: Problem
Keywords: bug message queue temp folder bootstamp Cc: anders.widen@…

Description

There is a serious bug within the message queue. Originally boost message queue intends to create temp files under a randomly generated temp folder, boost uses undocumented Windows APIs to get the bootstamp to generate the folder name, the folder name looks like "C:\Documents and Settings\All Users\Application Data/boost_interprocess/D0F325BE8579CA01/". Unfortunately that there is a bug of the method to generate the bootstamp, that the bootstamp will vary, even without rebooting!!

This will cause problems that, if a message queue is running for hours, further request from client cannot connect to it because of the newly generated bootstamp is different!!

This bug can be replicated by

[Method 1]

  1. Write a test program (A), create a message queue and let it running for hours (e.g. 3hours...) When the message queue is created, a folder under "C:\Documents and Settings\All Users\Application Data\boost_interprocess\D0F325BE8579CA01" will be created. Notice the folder name "D0F325BE8579CA01".
  2. Write another test program (B) to connect to the message queue created by test program (A). You will notice that it's unable to connect to the message queue created by program (A).

You can also find that another folder "C:\Documents and Settings\All Users\Application Data\boost_interprocess\9053E2F2EBC0CA01" is created. Notice that the folder name "9053E2F2EBC0CA01" is different from "D0F325BE8579CA01".

[Method 2]

There is another more simple method to replicate the issue instead of to wait for hours. The steps are mostly the same as [Method 1], the difference is before running test program (B), please change the system time.

Afterwards test program (B) is unable to connect to test program (A) anymore.

Attachments

Change History

comment:1 Changed 4 years ago by rusty0831 <rusty_lai@…>

*Note: What I mean is the "boost::interprocess::message_queue" class.

comment:2 Changed 4 years ago by Anders Widén <anders.widen@…>

This is even more serious! The problem seems to apply to all named Boost.Interprocess resources (e.g. shared memory and named semaphores).

As a workaround I have rebuilt my code without the pre-processor symbols BOOST_INTERPROCESS_HAS_WINDOWS_KERNEL_BOOTTIME and BOOST_INTERPROCESS_HAS_KERNEL_BOOTTIME. I believe this would give me filesystem-persistence but this should be ok since the documentation is stating that all named resources could have either filesystem or kernel persistence.

comment:3 Changed 4 years ago by Anders Widén <anders.widen@…>

  • Cc anders.widen@… added

comment:4 in reply to: ↑ description Changed 4 years ago by anonymous

Thanks for this bug, it saves us a lot of time to trace a bug: when the applications are running for a few hrs, they won't communicate correctly with each other if the child process is created dynamically.

I have a simple fix for this, only works for windows system. Try to use the windows_shared_memory instead of shared_memory_object. It works as changing the system time, not test on leave it there for a few hrs. If anyone interested in the changes: replace detail::managed_open_or_create_impl< windows_shared_memory, false> m_shmem; with detail::managed_open_or_create_impl<shared_memory_object> m_shmem; and change the header include. comment out the message_queue::remove.

Hope this could help some1 on Windows to work around this problem by now.

comment:5 Changed 4 years ago by dxj19831029@…

Thanks for this bug, it saves us a lot of time to trace a bug: when the applications are running for a few hrs, they won't communicate correctly with each other if the child process is created dynamically.

I have a simple fix for this, only works for windows system. Try to use the windows_shared_memory instead of shared_memory_object. It works as changing the system time, not test on leave it there for a few hrs. If anyone interested in the changes: replace detail::managed_open_or_create_impl< windows_shared_memory, false> m_shmem; with detail::managed_open_or_create_impl<shared_memory_object> m_shmem; and change the header include. comment out the message_queue::remove.

Hope this could help some1 on Windows to work around this problem by now.

comment:6 Changed 4 years ago by klaas@…

I also noted this bug. It happened when using a shared_memory_object. We have a windows service that keeps running that we communicate with. After a while we had clients that failed to communicate with it.

I tracked down the bug. It seems that NtQuerySystemInformation? is used in get_system_time_of_day_information to create the path to store the files for sharing the memory. I noticed that after windows changed the time/date either by the Windows Time synchronization or by doing it manually, NtQuerySystemInformation? returned another boot time then before. Because of this, when another process was started and tried to communicate with the windows service it failed as it was looking for the files in a different directory.

I don't really have a suggestion with a decent fix. A workaround could be disabling the Windows Time service that does automatic synchronization (this workaround is untested).

comment:7 Changed 4 years ago by igaztanaga

Try latest trunk code. NtQuerySystemInformation? has been replaced with a call to WMI (slower, but I think it's much more robust).

comment:8 Changed 4 years ago by igaztanaga

  • Status changed from new to closed
  • Resolution set to fixed
  • Milestone changed from Boost 1.43.0 to Boost-1.45.0

Fixed for Boost 1.45 in release branch

comment:9 Changed 3 years ago by marek

i have been hit by this bug for longer also and always use workaround

testing beta 1.45 release now

seems now it is working on my vista(32bit) OS. but it still fails on legacy XP (32bit) i don't know about others versions

so i back to use my work-around of this problem and it is working smoothly again

\boost\boost\interprocess\detail\tmp_dir_helpers.hpp

#if defined (BOOST_INTERPROCESS_HAS_WINDOWS_KERNEL_BOOTTIME)
inline void get_bootstamp(std::string &s, bool add = false)
{
   std::string bootstamp;
   winapi::get_last_bootup_time(bootstamp);
+   bootstamp = "";
   if(add){
      s += bootstamp;
   }
   else{
      s = bootstamp;
   }
}
#elif defined(BOOST_INTERPROCESS_HAS_BSD_KERNEL_BOOTTIME)

if any1 else is seeing this it may be worth reopening this bug

thanks

comment:10 Changed 3 years ago by andysem

I took a look at the WMI code and noticed that in a few places you pass wide strings to COM methods. Strictly speaking, this is not correct because COM methods accept BSTRs, which are binary incompatible with wide C strings, unless the method implementation treats them as wide C strings. In particular, if it calls SysStringLen? on the argument, the result will be undefined.

FWIW, from  this thread it seems that using WMI has its drawbacks, let alone the complexity. Perhaps, using performance counters would suffice?  Here I found an example of reading a few system counters, of which "\Process(System)\Elapsed Time" might be what you need. The registry key also looks interesting but I didn't find a way to interpret it (looks like it changed its format in Vista). I did not dig deep enough into the code though, so this may be of little help to you.

Last edited 3 years ago by andysem (previous) (diff)
View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.