Modify

Ticket #3926 (assigned Patches)

Opened 4 years ago

Last modified 2 months ago

thread_specific_ptr + dlopen library causes a SIGSEGV.

Reported by: pluto@… Owned by: viboes
Milestone: To Be Determined Component: thread
Version: Boost 1.42.0 Severity: Problem
Keywords: Cc:

Description

hi,

i've discovered that using thread_specific_ptr in shared library dlopened from thread causes a gpf.
please consider following scenario:

spawn thread
dlopen a shared library with thread_specific_ptr
dlclose the library
terminate the thread

observe the gpf

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff62b7400 in ?? ()
(gdb) bt
#0  0x00007ffff62b7400 in ?? ()
#1  0x00007ffff7221f79 in __nptl_deallocate_tsd () at pthread_create.c:154
#2  0x00007ffff722291b in start_thread (arg=<value optimized out>) at pthread_create.c:304
#3  0x00007ffff6f9293d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

afaics the pthread (user provided) destructor keyed to pthread specific data is called on dlunloaded code.

BR,
Pawel.

Attachments

tls-test.boost.tgz Download (1.7 KB) - added by anonymous 4 years ago.
boost-tls.patch Download (1.3 KB) - added by pluto@… 4 years ago.
proposed patch for proper cleanup of the static libboost_thread linked into shared objects.
boost_thread.patch Download (1.7 KB) - added by dadrus 3 years ago.
patch for linux and mac

Change History

Changed 4 years ago by anonymous

comment:1 follow-up: ↓ 2 Changed 4 years ago by anthonyw

  • Status changed from new to closed
  • Resolution set to wontfix

When you dlclose a library, you must ensure that no code references that library afterwards. If you have a live thread_specific_ptr with a destructor that belongs to that library then the library must be loaded when the destructor is called. You must therefore ensure that the thread_specific_ptr does not have a value on any thread when the library is unloaded.

comment:2 in reply to: ↑ 1 Changed 4 years ago by pluto@…

  • Status changed from closed to reopened
  • Resolution wontfix deleted

Replying to anthonyw:

When you dlclose a library, you must ensure that no code references that library afterwards. If you have a live thread_specific_ptr with a destructor that belongs to that library then the library must be loaded when the destructor is called. You must therefore ensure that the thread_specific_ptr does not have a value on any thread when the library is unloaded.

the thread_specific_ptr dtor called from dlclose() calls set_tss_data() with boost default cleanup function and *NULL* new_value. what's the point of deleting NULL pointer in cleanup function? this accidentally works in shared libs enviroment beacause the cleanup function exists in memory and 'delete 0' is valid in c++. with dynamically loaded libs glibc tries to call useless cleanup callback from dl-unloaded code and ends with gpf. solution: the thread_specific_ptr dtor should delete current_value and call set_tss_data() in the same way as the release() does.

comment:3 Changed 4 years ago by pluto@…

currently there's a crash in boost.thread core. please follow this scenario:

start thread.
  dlopen lib.
    run function from lib which uses thread_specific_ptr.
      boost::detail::create_epoch_tss_key registers delete_epoch_tss_data destructor.
    exit from lib function.
  dlclose lib (unload delete_epoch_tss_data code).
exit thread
  __nptl_deallocate_tsd tries to call unloaded key destructor and crashes.

i think that libs/thread/src/pthread/once.cpp should contain alternative cleanup function with attributte(( destructor )):

if ( pthread_getspecific( epoch_tss_key ) )
  pthread_key_delete( epoch_tss_key )

Changed 4 years ago by pluto@…

proposed patch for proper cleanup of the static libboost_thread linked into shared objects.

comment:4 Changed 3 years ago by dadrus

The patch provided above is not applicable on apple. At least on Mac OSX 10.5 PTHREAD_ONCE_INIT is defined as

#define PTHREAD_ONCE_INIT {_PTHREAD_COND_SIG_init, {0}}

and not as

#define PTHREAD_ONCE_INIT 0

like on Linux, so one can not use simple compare operator. Here comes the patch, which works on both, linux and mac. I'm still using 1.38 version of boost. So one has to adjust it for newer versions.

Changed 3 years ago by dadrus

patch for linux and mac

comment:5 Changed 3 years ago by dadrus

There is a small typo in my last comment. On Mac PTHREAD_ONCE_INIT is defined as follows:

#define PTHREAD_ONCE_INIT {_PTHREAD_ONCE_SIG_init, {0}}

I would like also to ask you Anthony to tackle this ticket soon. Hopefully for the next boost release.

Regards

Dimitrij

comment:6 Changed 3 years ago by tkramer@…

I ran into this recently as well. Any updates on getting this patch committed?

comment:7 Changed 2 years ago by viboes

  • Owner changed from anthonyw to viboes
  • Status changed from reopened to new
  • Type changed from Bugs to Patches
  • Milestone changed from Boost 1.43.0 to To Be Determined

comment:8 Changed 2 years ago by viboes

I don't know nothing about dl_open/dl_close. Please, could you explain me when and why delete_epoch_tss_key_on_dlclose will be called?

comment:9 Changed 2 years ago by viboes

Oh, I see now it is the gcc attributte(( destructor )) that make it be called after main().

Shouldn't this function be added conditionally (when this attribute is available)?

What about compilers that don't support this attribute? How the the key will be deleted?

comment:10 Changed 2 years ago by viboes

  • Status changed from new to assigned

comment:11 follow-up: ↓ 12 Changed 2 years ago by viboes

See also #4639 boost thread library leaks pthread_key

Last edited 7 months ago by viboes (previous) (diff)

comment:12 in reply to: ↑ 11 ; follow-up: ↓ 14 Changed 2 years ago by pluto@…

Replying to viboes:

See also #4639 boost thread library leaks pthread_key

yes, the #4639 describes the same problem but there's one major issue with both patches. the desctruction order of global objects is not specified in general, especially in these days when recent gnu toolchain can sort .init/.fini-array sections.

e.g. the ~delete_epoch_tss_key_on_dlclose_t() from #4639 may release pthread_key and after this the ~thread_specific_ptr() may accuire key again, set null handler and finally leak/gpf as usual.

Last edited 7 months ago by viboes (previous) (diff)

comment:13 Changed 2 years ago by viboes

See also #2470 Problem with threads started in a dynamically loaded bundle under Mac OS X

comment:14 in reply to: ↑ 12 Changed 8 months ago by viboes

Replying to pluto@…:

Replying to viboes:

See also #4639 boost thread library leaks pthread_key

yes, the #4639 describes the same problem but there's one major issue with both patches. the desctruction order of global objects is not specified in general, especially in these days when recent gnu toolchain can sort .init/.fini-array sections.

e.g. the ~delete_epoch_tss_key_on_dlclose_t() from #4639 may release pthread_key and after this the ~thread_specific_ptr() may accuire key again, set null handler and finally leak/gpf as usual.

comment:15 follow-up: ↓ 16 Changed 3 months ago by kai.dietrich@…

Hi all,

I also got bitten by that one. Our application compiles boost statically into a shared library (.so/linux). Users are supposed to dl_open/dl_close (potentially often). Each open/close cycle leads to a key/handle leak due to boost not freeing the handle and not giving any control over the tls handles to the user.

At the moment the tls keys in thread.cpp and once.cpp are internal and not accessible from the outside. Would it be possible to give out a way to the user. The windows implementation also has some pretty verbose tls cleanup code.

This bug can be a real production showstopper. I'd give it a way higher priority.

comment:16 in reply to: ↑ 15 Changed 3 months ago by viboes

Replying to kai.dietrich@…:

Hi all,

I also got bitten by that one. Our application compiles boost statically into a shared library (.so/linux). Users are supposed to dl_open/dl_close (potentially often). Each open/close cycle leads to a key/handle leak due to boost not freeing the handle and not giving any control over the tls handles to the user.

At the moment the tls keys in thread.cpp and once.cpp are internal and not accessible from the outside. Would it be possible to give out a way to the user. The windows implementation also has some pretty verbose tls cleanup code.

This bug can be a real production showstopper. I'd give it a way higher priority.

I know and any help in this direction is welcome.

comment:17 Changed 2 months ago by anonymous

As said, the win32 implementation has this:

  namespace boost {
    namespace detail {

      BOOST_THREAD_DECL void __cdecl on_process_enter()
      {}

      BOOST_THREAD_DECL void __cdecl on_thread_enter()
      {}

      BOOST_THREAD_DECL void __cdecl on_process_exit()
      {
        boost::cleanup_tls_key();
      }

      BOOST_THREAD_DECL void __cdecl on_thread_exit()
      {
        ...
      }

    }
  }

These functions are then somehow added into the PE image destructor sections (tss_dll.cpp and tss_pe.cpp). This was also buggy in some boost versions and messed up MFC cleanup (which also a tls handle leak) so we just disabled that and call the on_process_exit() function manually.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.