Bonita Montero
2022-05-27 17:22:28 UTC
VC++ allows thread_local variables with VC++ only in executables and
not in DLLs. Thread local storage relies on that the global variable
__tls_index for which proper storage has to be allocated per thread.
This storage must be large enough to accomodate all thread_local and
__declspec(thread) variables declared in any function of the DLL.
Executables must have a hook unkown to me to notice any thread
creation and termination to allocate an deallocate that amount of
storage. DLLs have their DllMain which has a clean notification
mechanism when a DLL is being attached to a process, when a thread
is created or for already running theads while this DLL is being
loaded or for thread termination. With that it should be easy to
allocate a TLS index with DLL_PROCESS_ATTACH, deallocate it with
DLL_PROCESS_DETACH, allocate a proper amount of thread-specific
memory with DLL_THREAD_ATTACH and deallocate this memory with
DLL_THREAD_DETACH. The only thing that has to be documented with
that would be the variable __tls_index as well as a variable in
read-only memory which gives the amount of storage which needs
to be allocated with the total amount of storage needed for a
thread.
One issue with that is that C++ requires a thread_local object
to be constructed when the code first comes across the definition
of that variable. So there must be a boolean variable attached to
each objects not constructed by a constructor that says if that
object aleady has been constructed. The code allocating that
the thread local storage in DLL_THREAD_ATTACH could simply rely
on that this boolean variable is zero memory and zero the whole
block with or on allocation.
I implemented this in an undocumented way, but that's rather
simple:
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <string>
#include <new>
using namespace std;
DWORD __tls_index;
BOOL APIENTRY DllMain( HMODULE hModule, DWORD dwReason, LPVOID lpReserved )
{
switch( dwReason )
{
case DLL_PROCESS_ATTACH:
::__tls_index = TlsAlloc();
break;
case DLL_THREAD_ATTACH:
TlsSetValue( __tls_index, (void *)LocalAlloc( LMEM_ZEROINIT, 0x1000 ) );
break;
case DLL_THREAD_DETACH:
LocalFree( (HLOCAL)TlsGetValue( __tls_index ) );
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
extern "C"
__declspec(dllexport)
void myExport( char *out )
{
thread_local string str( "hello world, hello world, hello world" );
strcpy( out, str.c_str() );
}
The most unclean part of that solution is that I allocate an amount
of memory that's for sure larger than needed. A documented read-only
"variable" calculated and generated by the linker should be useful
here. The next undocumented issue that is more reliable is the image
-specific variable __tls__index which also would have to be documented.
not in DLLs. Thread local storage relies on that the global variable
__tls_index for which proper storage has to be allocated per thread.
This storage must be large enough to accomodate all thread_local and
__declspec(thread) variables declared in any function of the DLL.
Executables must have a hook unkown to me to notice any thread
creation and termination to allocate an deallocate that amount of
storage. DLLs have their DllMain which has a clean notification
mechanism when a DLL is being attached to a process, when a thread
is created or for already running theads while this DLL is being
loaded or for thread termination. With that it should be easy to
allocate a TLS index with DLL_PROCESS_ATTACH, deallocate it with
DLL_PROCESS_DETACH, allocate a proper amount of thread-specific
memory with DLL_THREAD_ATTACH and deallocate this memory with
DLL_THREAD_DETACH. The only thing that has to be documented with
that would be the variable __tls_index as well as a variable in
read-only memory which gives the amount of storage which needs
to be allocated with the total amount of storage needed for a
thread.
One issue with that is that C++ requires a thread_local object
to be constructed when the code first comes across the definition
of that variable. So there must be a boolean variable attached to
each objects not constructed by a constructor that says if that
object aleady has been constructed. The code allocating that
the thread local storage in DLL_THREAD_ATTACH could simply rely
on that this boolean variable is zero memory and zero the whole
block with or on allocation.
I implemented this in an undocumented way, but that's rather
simple:
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <string>
#include <new>
using namespace std;
DWORD __tls_index;
BOOL APIENTRY DllMain( HMODULE hModule, DWORD dwReason, LPVOID lpReserved )
{
switch( dwReason )
{
case DLL_PROCESS_ATTACH:
::__tls_index = TlsAlloc();
break;
case DLL_THREAD_ATTACH:
TlsSetValue( __tls_index, (void *)LocalAlloc( LMEM_ZEROINIT, 0x1000 ) );
break;
case DLL_THREAD_DETACH:
LocalFree( (HLOCAL)TlsGetValue( __tls_index ) );
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
extern "C"
__declspec(dllexport)
void myExport( char *out )
{
thread_local string str( "hello world, hello world, hello world" );
strcpy( out, str.c_str() );
}
The most unclean part of that solution is that I allocate an amount
of memory that's for sure larger than needed. A documented read-only
"variable" calculated and generated by the linker should be useful
here. The next undocumented issue that is more reliable is the image
-specific variable __tls__index which also would have to be documented.