Discussion:
Found a undocumented way to have thread_local variables in DLLs
(too old to reply)
Bonita Montero
2022-05-27 17:22:28 UTC
Permalink
VC++ allows thread_local variables with VC++ only in executables and
not in DLLs. Thread local storage relies on that the global variable
__tls_index for which proper storage has to be allocated per thread.
This storage must be large enough to accomodate all thread_local and
__declspec(thread) variables declared in any function of the DLL.
Executables must have a hook unkown to me to notice any thread
creation and termination to allocate an deallocate that amount of
storage. DLLs have their DllMain which has a clean notification
mechanism when a DLL is being attached to a process, when a thread
is created or for already running theads while this DLL is being
loaded or for thread termination. With that it should be easy to
allocate a TLS index with DLL_PROCESS_ATTACH, deallocate it with
DLL_PROCESS_DETACH, allocate a proper amount of thread-specific
memory with DLL_THREAD_ATTACH and deallocate this memory with
DLL_THREAD_DETACH. The only thing that has to be documented with
that would be the variable __tls_index as well as a variable in
read-only memory which gives the amount of storage which needs
to be allocated with the total amount of storage needed for a
thread.
One issue with that is that C++ requires a thread_local object
to be constructed when the code first comes across the definition
of that variable. So there must be a boolean variable attached to
each objects not constructed by a constructor that says if that
object aleady has been constructed. The code allocating that
the thread local storage in DLL_THREAD_ATTACH could simply rely
on that this boolean variable is zero memory and zero the whole
block with or on allocation.

I implemented this in an undocumented way, but that's rather
simple:

#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <string>
#include <new>

using namespace std;

DWORD __tls_index;

BOOL APIENTRY DllMain( HMODULE hModule, DWORD dwReason, LPVOID lpReserved )
{
switch( dwReason )
{
case DLL_PROCESS_ATTACH:
::__tls_index = TlsAlloc();
break;
case DLL_THREAD_ATTACH:
TlsSetValue( __tls_index, (void *)LocalAlloc( LMEM_ZEROINIT, 0x1000 ) );
break;
case DLL_THREAD_DETACH:
LocalFree( (HLOCAL)TlsGetValue( __tls_index ) );
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}

extern "C"
__declspec(dllexport)
void myExport( char *out )
{
thread_local string str( "hello world, hello world, hello world" );
strcpy( out, str.c_str() );
}

The most unclean part of that solution is that I allocate an amount
of memory that's for sure larger than needed. A documented read-only
"variable" calculated and generated by the linker should be useful
here. The next undocumented issue that is more reliable is the image
-specific variable __tls__index which also would have to be documented.
R.Wieser
2022-05-27 20:39:28 UTC
Permalink
Bonita,
VC++ allows thread_local variables with VC++ only in executables and not
in DLLs.
Thats odd, as they seem to be purposely created for DLL usage ...

https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc
Thread local storage relies on that the global variable __tls_index for
which proper storage has to be allocated per thread.
I'm not sure you are aware of how different that can be read than from how
you ment it ...

Luckily you again explain it a bit lower, and with a bit more info to go on
: "it should be easy to allocate a TLS index with DLL_PROCESS_ATTACH,"
This storage must be large enough to accomodate all thread_local
and __declspec(thread) variables declared in any function of the DLL.
You did not yet explain what "this storage" is, but further on I see you
allocate some memory and put its pointer/handle into Tls storage. I'm
assuming that that is what you are talking about.
Executables must have a hook unkown to me to notice any thread
creation and termination to allocate an deallocate that amount of
storage.
No hook needed : When a process creates a thread it executes the same way
as a function : it starts at the top and exits at the bottom. That means
that the threads codeblock can just allocate that memory at the top, and
deallocate it at the bottom, just before exiting.
The only thing that has to be documented with
that would be the variable __tls_index
Yup.
as well as a variable in read-only memory which gives the amount of
storage which needs to be allocated with the total amount of storage
needed for a
thread.
Shouldn't you already know that when you create that DLL ? IOW, a
constant set in the DLLs sourcecode should cover it.
One issue with that is that C++ requires a thread_local object
to be constructed when the code first comes across the definition
of that variable. So there must be a boolean variable attached to
each objects not constructed by a constructor that says if that
object aleady has been constructed.
If your program comes across the Tls storage initialisation code multiple
times you're probably doing something wrong. If all is well you only
encounter it at the top (of the program or the DLL), and than never again.

But if you have that problem, why not use the variable that stores the index
as the "boolean" itself. Just initialize it with the TLS_OUT_OF_INDEXES
value. If you see that it means you need to run TlsAlloc. If that same
value is returned by TlsAlloc it means you have a big problem, and need to
abort. IOW, after having run TlsAlloc once you either abort or the
variable storing the index has changed.
The most unclean part of that solution is that I allocate an amount
of memory that's for sure larger than needed.
As mentioned in the above, if you are writing the DLLs thread you should be
able to know how much thread-global storage you need for it.

Regards,
Rudy Wieser
Bonita Montero
2022-05-27 20:56:09 UTC
Permalink
You did not yet explain what "this storage" is, ...
Ok, you don't know it.
No hook needed : When a process creates a thread it executes the same way
as a function : it starts at the top and exits at the bottom. That means
that the threads codeblock can just allocate that memory at the top, and
deallocate it at the bottom, just before exiting.
Thread local storage for a thread on Windows is allcoated _before_ the
thread gets it first timeslice. You can even create multiple threads
fom DLL_PROCESS_ATTACH - they won't before each DLL_THREAD_ATTACH for
each thread is called.
Shouldn't you already know that when you create that DLL ?
IOW, a constant set in the DLLs sourcecode should cover it.
DLLs don't support TLS.
If your program comes across the Tls storage initialisation code multiple
times you're probably doing something wrong. ...
This doesn't happen since the operating system does DLL_THREAD
_ATTACH calls only once for each thread. And the initialization
of a thread_local with a constructor variable is guarde by a
simple boolean value, also residing in TLS, wich is initially
false and becomes true only once (similar to static initialization
since C++11, but without any synchronization).
But if you have that problem, why not use the variable that stores the index
as the "boolean" itself.
Because TLS does need much more information per thread than s single
boolean.
Just initialize it with the TLS_OUT_OF_INDEXES value.
*facpalm*
Are you trolling ?
As mentioned in the above, if you are writing the DLLs thread you should
be able to know how much thread-global storage you need for it.
The DLL itself determines how much thread-local storage has to
be allocated, it's the same for all targets.
Chris M. Thomasson
2022-05-27 21:26:59 UTC
Permalink
Post by Bonita Montero
You did not yet explain what "this storage" is, ...
Ok, you don't know it.
No hook needed :  When a process creates a thread it executes the same
way
as a function : it starts at the top and exits at the bottom.   That
means
that the threads codeblock can just allocate that memory at the top, and
deallocate it at the bottom, just before exiting.
Thread local storage for a thread on Windows is allcoated _before_ the
thread gets it first timeslice. You can even create multiple threads
fom DLL_PROCESS_ATTACH - they won't before each DLL_THREAD_ATTACH for
each thread is called.
Shouldn't you already know that when you create that DLL ?
IOW, a constant set in the DLLs sourcecode should cover it.
DLLs don't support TLS.
[...]

https://docs.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library
Bonita Montero
2022-05-27 21:33:36 UTC
Permalink
Post by Chris M. Thomasson
Post by Bonita Montero
You did not yet explain what "this storage" is, ...
Ok, you don't know it.
No hook needed :  When a process creates a thread it executes the
same way
as a function : it starts at the top and exits at the bottom.   That
means
that the threads codeblock can just allocate that memory at the top, and
deallocate it at the bottom, just before exiting.
Thread local storage for a thread on Windows is allcoated _before_ the
thread gets it first timeslice. You can even create multiple threads
fom DLL_PROCESS_ATTACH - they won't before each DLL_THREAD_ATTACH for
each thread is called.
Shouldn't you already know that when you create that DLL ?
IOW, a constant set in the DLLs sourcecode should cover it.
DLLs don't support TLS.
[...]
https://docs.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library
Ok, but that's manual TLS.
R.Wieser
2022-05-28 08:27:02 UTC
Permalink
Bonita,
Post by Bonita Montero
Post by Chris M. Thomasson
https://docs.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library
Ok, but that's manual TLS.
:-) Oh boy, I would really like to see your face when you discover the
Assembly language, and realize that you've been fed VC++ freebies all this
time ...

Or, in other words : If-and-when you have a TLS storage slot available to
you inside threads in your main program than that is something the VC++
programming language does (injects) for you.

No idea why though, as I've never had the need for TLS slots in my main
program - as, as I already explained, threads work the same way as functions
in regard to its local variables.

In a programs thread - the main one or ones later or ones created by it -
you could just store that memory pointer in a variable local to that thread.
It will still be there and valid when you reach the functions or threads
end.

Regards,
Rudy Wieser
Bonita Montero
2022-05-28 08:37:22 UTC
Permalink
Post by R.Wieser
:-) Oh boy, I would really like to see your face when you discover the
Assembly language, and realize that you've been fed VC++ freebies all this
time ...
Compiler and platform-supported thread-local storage without having
manually to do all the TLS slot allocation and memory allocation for
each thread is of practical use and not a freebie for sure.
Post by R.Wieser
No idea why though, as I've never had the need for TLS slots in my main
program - as, as I already explained, threads work the same way as functions
in regard to its local variables.
You don't understand the difference between normal local variables and
local variables being declared as thread-local. The latter one need more
spefic support of the compiler and the platform - on all platforms, even
Linux.
Post by R.Wieser
In a programs thread - the main one or ones later or ones created by it -
you could just store that memory pointer in a variable local to that thread.
That woudln't make any sense because this pointer isn't thread-local
itself.
Post by R.Wieser
It will still be there and valid when you reach the functions or threads
end.
You're ultimately stupid.
R.Wieser
2022-05-28 13:01:13 UTC
Permalink
Bonita,
Post by Bonita Montero
You don't understand the difference between normal local variables and
local variables being declared as thread-local.
I think I do, but if you say so.
Post by Bonita Montero
Post by R.Wieser
In a programs thread - the main one or ones later or ones created by it -
you could just store that memory pointer in a variable local to that thread.
That woudln't make any sense because this pointer isn't thread-local
itself.
Ofcourse it isn't. It can be used from anywhere in your program. But you
are *making* it thread-local by storing it into a TLS element - which can
only be accessed by the thread its stored in.

I also thought to have read that you would be using that allocated memory
for stuff related to a certain thread - meaning that if any other thread
would try to use it it would probably barf and crash the whole program. And
in my book that makes the pointer itself "thread local".
Post by Bonita Montero
Post by R.Wieser
It will still be there and valid when you reach the functions or threads
end.
You're ultimately stupid.
*Ofcourse* I am.

Well, good luck with your programming. You'll need it trying to figure out
everything yourself.

Regards,
Rudy Wieser
Bonita Montero
2022-05-28 15:27:14 UTC
Permalink
Post by R.Wieser
Ofcourse it isn't. It can be used from anywhere in your program. But you
are *making* it thread-local by storing it into a TLS element - which can
only be accessed by the thread its stored in.
That's an stupid idea because the TLS-slot can contain only a single
void pointer per thread. An EXE or DLL requiring only a single pointer
is extremly unlikey any you would obstruct yourself the opportunity to
have more thread-local data with that image.
R.Wieser
2022-05-28 08:15:31 UTC
Permalink
Bonita,
Post by Bonita Montero
You did not yet explain what "this storage" is, ...
Ok, you don't know it.
Not at the moment you mentioned it, no. IOW, the order in which you
explained yourself left a bit to be desired.
Post by Bonita Montero
Shouldn't you already know that when you create that DLL ?
IOW, a constant set in the DLLs sourcecode should cover it.
DLLs don't support TLS.
Whut? You have even included code showing they do.
Post by Bonita Montero
If your program comes across the Tls storage initialisation code multiple
times you're probably doing something wrong. ...
This doesn't happen since the operating system does DLL_THREAD
_ATTACH calls only once for each thread.
In that case, why did you mention the need for a boolean ?
Post by Bonita Montero
But if you have that problem, why not use the variable that stores the index
as the "boolean" itself.
Because TLS does need much more information per thread than s single
boolean.
Read what you quoted again. Notice both that I said "why not *USE*" as well
as me quoting the "boolean" word, giving an indication that its not actually
a boolean but something else. But read on.
Post by Bonita Montero
Just initialize it with the TLS_OUT_OF_INDEXES value.
*facpalm*
Are you trolling ?
Yes, that must be it. On the other hand, you just might not have
understood what I tried to offer you :

Create the (global) __tls_index variable initialized with the value
TLS_OUT_OF_INDEXES. Than use the following:

pseudo-code:
- - - - -
if __tls_index == TLS_OUT_OF_INDEXES {
__tls_index =TlsAlloc()
if __tls_index == TLS_OUT_OF_INDEXES {
Houston, we have a problem and need to abort.
}
- - - - -

At this point you have either aborted, or the __tls_index variable contains
a value different from TLS_OUT_OF_INDEXES, meaning that the next time around
the Tls slot intialisation (and its result checking) will be skipped.
Post by Bonita Montero
As mentioned in the above, if you are writing the DLLs thread you should
be able to know how much thread-global storage you need for it.
The DLL itself determines how much thread-local storage has to
be allocated, it's the same for all targets.
You're confusing me : What is the below than all about ? :

[quote]
as well as a variable in read-only memory which gives the amount of storage
which needs to be allocated with the total amount of storage needed for a
thread.
[/quote]

Either you do /not/ know how much you need to allocate and it needs to be
gotten from where you stored it somewhere in your DLL (but where than do you
get the number from you have stored there ?), or you /do/ know, and no such
storage is needed. Not both.


Remark :
My previous message as well as the current one have been written in an
attempt to give you some hints and further possibly worth-to-know info.
If you think you should be nasty about it I think I will just leave you to
your own devices.

Though congrats for figuring out how to use Tls in DLLs (even though that is
what it was ment for in the first place). I know from experience how
difficult the "how the h*ll does that work ?" sometimes is.

Regards,
Rudy Wieser
Bonita Montero
2022-05-28 08:37:52 UTC
Permalink
Don't talk about things which you don't understand.
R.Wieser
2022-05-28 13:10:38 UTC
Permalink
Bonita,
Post by Bonita Montero
Don't talk about things which you don't understand.
Do you know about the Dunning-Kruger effect ?
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

It effectivily tells us that people with low knowledge often think that they
know /way/ more than they actually do. Give it a few years more experience
and you will probably cringe thinking back about the misguided confidence
you're displaying here.

Regards,
Rudy Wieser
Bonita Montero
2022-05-28 15:27:57 UTC
Permalink
Post by R.Wieser
Do you know about the Dunning-Kruger effect ?
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Yes, but you don't understand that you're the one with the
Dunning-Kuger effect, not me.
R.Wieser
2022-05-28 16:40:17 UTC
Permalink
Bonita,
Post by Bonita Montero
Yes, but you don't understand that you're the one with the
Dunning-Kuger effect, not me.
As a thank-you I have deleted my response to your previous post. I'm rather
sure you could have used it, as you sound rather confused in it. But hey,
I'm the one who you think knows "nuthin about nuthin", so any such help of
mine would be wasted time for you to read. Right ?

Great going kid, just smashing.

Regards,
Rudy Wieser
Bonita Montero
2022-05-28 17:28:22 UTC
Permalink
Post by R.Wieser
Bonita,
Post by Bonita Montero
Yes, but you don't understand that you're the one with the
Dunning-Kuger effect, not me.
As a thank-you I have deleted my response to your previous post. I'm rather
sure you could have used it, as you sound rather confused in it. But hey,
I'm the one who you think knows "nuthin about nuthin", so any such help of
mine would be wasted time for you to read. Right ?
Sorry, you want to join a conversation without being eligible.
R.Wieser
2022-05-29 11:22:20 UTC
Permalink
Bonita,
Post by Bonita Montero
Sorry, you want to join a conversation without being eligible.
Kid, all I see is a relative novice in his programming language who -
rightly so - is proud of having found something out by himself. The only
problem is that you have let that proudness carry you away in thinking you
now know everything about it.

I've been doing Windows Assembly programming for over two decades, and not
even I think I know everything. Far from it actually.

And as I'm doing Assembly I have to do pretty-much everything myself. As
such I for instance know that the OS doesn't create a TLS slot before a
program is run, but what you see is the effect of your VC++ programming
language silently inserting code to do that at the top of the main thread.

Why it does that I don't know, as I've never had any use for TLS slots in
the programs main thread or threads started by it.

Bottom line, I think you could have learned quite a bit from me, even though
there stil is a lot I do not know myself.

... but for that you first have to grow over your current "noone knows
better than I do" conviction.

Regards,
Rudy Wieser
Bonita Montero
2022-05-29 11:30:27 UTC
Permalink
Post by R.Wieser
Bonita,
Post by Bonita Montero
Sorry, you want to join a conversation without being eligible.
Kid, all I see is a relative novice in his programming language who -
rightly so - is proud of having found something out by himself. The only
problem is that you have let that proudness carry you away in thinking you
now know everything about it.
Ok, you find a novice and don't understand the whole discussion.
Rest unread.
R.Wieser
2022-05-29 12:41:41 UTC
Permalink
Bonita,
Post by Bonita Montero
Ok, you find a novice and don't understand the whole discussion.
What discussion ? All you did was you showing off what you found. I than
made a few comments on it and you rejected all of them - often by not
understanding anything about what I tried to explain.

In one instance I even followed it up with some pseudo-code to help you
understand what I was talking about, but you didn't even bother to
acknowledge it. Ofcourse, that made me think that by it you understood you
made a mistake the first time, but are simply not (yet) man enough to be
honest about it ...

You're talking about a discussion that I didn't understand ? There was no
discussion. That word implicates two (or more) people trying to understand
each others viewpoints - even when they do not agree with each other - which
I have not seen you do in any shape or form. :-(

Regards,
Rudy Wieser
Bonita Montero
2022-05-29 12:56:27 UTC
Permalink
Post by R.Wieser
Bonita,
Post by Bonita Montero
Ok, you find a novice and don't understand the whole discussion.
What discussion ? All you did was you showing off what you found.
You constantly didn't understand what I said.
R.Wieser
2022-05-29 15:13:42 UTC
Permalink
Bonita,
Post by Bonita Montero
You constantly didn't understand what I said.
Yeah, that must obviously it. You, as a novice, understood perfectly what
I tried to tell you every time, its just me, with two decades of experience,
who was unable to understand anything you said.

Kid, go pull someone elses leg.

Regards,
Rudy Wieser

Mr Flibble
2022-05-27 20:48:49 UTC
Permalink
On Fri, 27 May 2022 19:22:28 +0200
Post by Bonita Montero
VC++ allows thread_local variables with VC++ only in executables and
not in DLLs. Thread local storage relies on that the global variable
__tls_index for which proper storage has to be allocated per thread.
This storage must be large enough to accomodate all thread_local and
__declspec(thread) variables declared in any function of the DLL.
Executables must have a hook unkown to me to notice any thread
creation and termination to allocate an deallocate that amount of
storage. DLLs have their DllMain which has a clean notification
mechanism when a DLL is being attached to a process, when a thread
is created or for already running theads while this DLL is being
loaded or for thread termination. With that it should be easy to
allocate a TLS index with DLL_PROCESS_ATTACH, deallocate it with
DLL_PROCESS_DETACH, allocate a proper amount of thread-specific
memory with DLL_THREAD_ATTACH and deallocate this memory with
DLL_THREAD_DETACH. The only thing that has to be documented with
that would be the variable __tls_index as well as a variable in
read-only memory which gives the amount of storage which needs
to be allocated with the total amount of storage needed for a
thread.
One issue with that is that C++ requires a thread_local object
to be constructed when the code first comes across the definition
of that variable. So there must be a boolean variable attached to
each objects not constructed by a constructor that says if that
object aleady has been constructed. The code allocating that
the thread local storage in DLL_THREAD_ATTACH could simply rely
on that this boolean variable is zero memory and zero the whole
block with or on allocation.
I implemented this in an undocumented way, but that's rather
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <string>
#include <new>
using namespace std;
DWORD __tls_index;
BOOL APIENTRY DllMain( HMODULE hModule, DWORD dwReason, LPVOID
lpReserved ) {
switch( dwReason )
{
::__tls_index = TlsAlloc();
break;
TlsSetValue( __tls_index, (void *)LocalAlloc(
LMEM_ZEROINIT, 0x1000 ) ); break;
LocalFree( (HLOCAL)TlsGetValue( __tls_index ) );
break;
}
return TRUE;
}
extern "C"
__declspec(dllexport)
void myExport( char *out )
{
thread_local string str( "hello world, hello world, hello
world" ); strcpy( out, str.c_str() );
}
The most unclean part of that solution is that I allocate an amount
of memory that's for sure larger than needed. A documented read-only
"variable" calculated and generated by the linker should be useful
here. The next undocumented issue that is more reliable is the image
-specific variable __tls__index which also would have to be
documented.
I am using thead_local in DLLs without this hack; are you aware that
Microsoft fixed an issue regarding this in VS2019?

/Flibble
Bonita Montero
2022-05-27 21:01:23 UTC
Permalink
Post by Mr Flibble
I am using thead_local in DLLs without this hack; are you
aware that Microsoft fixed an issue regarding this in VS2019?
You're right, I relied on outdated information.
Much effort for nothing.
Paavo Helde
2022-05-27 21:02:57 UTC
Permalink
Post by Bonita Montero
VC++ allows thread_local variables with VC++ only in executables and
not in DLLs.
In general, thread_local in DLL-s with MSVC++ is working fine. I have
not seen any issues in production for several years.

There are some limitations, one cannot dll-export a thread_local
variable. But why would one want to do that?

There are also mentions in MS documentation that thread local variables
might not work so well before Windows Vista. But who remembers those times?

There are also problems that thread_local storage is initialized in
DllMain and so might not be properly initialized when accessed in old
threads which existed before dynamically loading the DLL. But it looks
like your solution does no better, it's using the same DllMain mechanism
and looks pretty simplistic.

If you insist you have solved a problem then you should first
demonstrate the problem, then show how your solution fixes that.
Bonita Montero
2022-05-27 21:32:30 UTC
Permalink
Post by Paavo Helde
Post by Bonita Montero
VC++ allows thread_local variables with VC++ only in executables and
not in DLLs.
In general, thread_local in DLL-s with MSVC++ is working fine. I have
not seen any issues in production for several years.
Yes, I noticed this also. I already said that to MrFibble (idiotic
pseudonym), but then I thought that I never knew that there were
times when this doesn't happen. The issue was simple that my DLL
injection code doesn't allow thread_local variables for the DLL
injected into a foreign process. And then I've seen that manual
thread-local storage with GetTls() ... is documented very well
and thought that there must be a general drawback of thread local
storage in DLLs.
Post by Paavo Helde
There are some limitations, one cannot dll-export a thread_local
variable. But why would one want to do that?
Limitaton ? What do you want ? A GetProcAddess() (not only used
for callable code but also for arbitrary global variables) with
an attached thread-id to get the right version ???
Post by Paavo Helde
There are also mentions in MS documentation that thread local variables
might not work so well before Windows Vista. But who remembers those times?
Ooooh, I was yet partitially right, so my above corrections weren't
absolutely necessary. Before Vista your code using thread local storage
could crash when the library is loaded dynamically.
Post by Paavo Helde
There are also problems that thread_local storage is initialized in
DllMain ...
I think there are no reasons to have thread local storage in DllMain.
Post by Paavo Helde
If you insist you have solved a problem then you should first
demonstrate the problem, then show how your solution fixes that.
At least before Vista my code solves a problem because it does
the whole thread local storage initialization on its own.
Loading...