Question about thread initialization (original) (raw)
David Holmes david.holmes at oracle.com
Fri Oct 19 04:21:01 UTC 2018
- Previous message: Question about thread initialization
- Next message: Question about thread initialization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Thomas,
On 18/10/2018 8:51 PM, Thomas Stüfe wrote:
Hi David,
http://cr.openjdk.java.net/~stuefe/webrevs/8212173-thread-stack-and-base-initialized-too-late/webrev.00/webrev/ What do you think?
I think it looks okay generally.
(I had not yet time to compile this on anything other than linux)
I'm running it through our build/test system.
- I added a new method to Thread, Thread::callrun(), which calls run(). run() now protected and abstract.
- Thread::callrun() is the place to do last-minute initializations which should be run right before childclass ::run is invoked, and also common cleanup stuff after childclass::run returns.
Okay.
- I removed all non-essential parts from Threads::recordstackbaseandsize() (NMT stuff, logging) and moved it to Thread::callrun(), to be done before childclass::run is called but after Thread is initialized.
Okay
- Now Thread::recordstackbaseandsize() is safe to be called whenever, so I call it in all five platform variants of threadnativeentry, right at the start
Okay
- I removed completely the initializethread function. On Solaris, I renamed it to a much more specific "correctstackboundariesforprimordialthread()" - because thats what it is - and moved it also to os::Solaris::.
Okay
(Are you see a pattern yet? ;-)))
- For solaris only, in Thread::recordstackbaseandsize(), I call os::Solaris::correctstackboundariesforprimordialthread(). Has to be there, unfortunately, unless we split Thread::recordstackbaseandsize() in two parts. I do not like platform ifdefs but I like this better than a pseudo generic "initializethread" which exists solely for one platform.
I think it is fine in this case.
- I removed all calls to Thread::recordstackbaseandsize() from all child classes ::run functions.
Okay
- I unified logging: since we now have a common shared hook before Thread::run(), we can do common things like logging there instead of spread over all platforms.
A nit here as we have lost the platform specific kernel thread identifier in the logging
Thanks, David
Best, Thomas
On Thu, Oct 18, 2018 at 10:08 AM David Holmes <david.holmes at oracle.com> wrote:
On 18/10/2018 5:52 PM, Thomas Stüfe wrote: Hi David, On Thu, Oct 18, 2018 at 3:29 AM David Holmes <david.holmes at oracle.com> wrote:
Hi Thomas, On 17/10/2018 10:33 PM, Thomas Stüfe wrote: This proves to be more difficult than I initially thought :/
The problem is that we need to initialize stack base, size from within the newly started thread itself: for stackbase, this is obvious. And even stacksize - which should usually be equal to the requested stack size known to the parent thread - might be larger, actually. But actually, on some platforms the thread-start-handshake is differently implemented than on Linux. On Linux, nativethreadentry() starts runnning right away after pthreadcreate(), and then waits for the handshake. On Solaris/AIX/Windows we simply create the new thread in suspended state, and in os::startthread(), we resume it. So nativethreadentry() never ran. Hmmm my (incorrect) recollection was that the "start suspended" ability only handled the first part of the handshake, I didn't realize it removed it all. I also didn't recall it being used other than on Solaris - but I've never looked at the AIX code (I assumed it would be a copy of Linux). I think I wrote that part for AIX but cannot recollect anymore why we do it the solaris/windows way instead of the linux way. Maybe it just seemed more natural. Wild guess, the Linux handshake code feels like a workaround from pre-nptl times, maybe because threads could not be started in a suspended state? POSIX pthreads doesn't support starting suspended - and AFAIK neither does NPTL! Now we have a hen-end-egg problem: we want to call Threads::add() before os::startthread() (I suppose, since the Thread may be short lived and we do not want to add a possibly dead thread to the Threads list). But we do not have the stack dimensions, since the thread on these platforms never ran and hence had no opportunity to initialize Thread::stackbase/size. Yes there is an unavoidable window where the thread is visible to "iterators" but has not yet initialized itself to the point where it is necessarily safe to interact with. Safepoint based iteration is not an issue, but we also seem to have a number of non-safepoint-based iterations occurring. But if we can initialise the stack base and size as the first thing in threadnativeentry, that should at least make the window as small as possible - no? And also handle the crash reporting case. I think this is what I will do. That will mitigate the problem somewhat for solaris, windows and solve it completely for Linux and Mac. Ok. Not really sure what to do here: I guess one could replace the resume mechanism on Solaris/Windows/AIx with the type of monitor-based handshake we do on Linux. But that seems heavy handed and may also make thread creation slower. Other than Windows that would be the result if we standardized on POSIX-based threading. But that's not an insignificant effort in itself, and the performance implications would still need examining. Yes, this looks like time consuming work, and I do not have much time left on this issue. In general I wonder whether there would be really any performance implications with the handshake technique - and if there are, why we use it then on Linux instead of cresting the thread in suspended state (I know, it is probably all historical). As I said no start-suspended with POSIX pthreads or NPTL. BTW by chance this bug seems to show code that is both susceptible to finding new threads too soon, and also not preventing threads from terminating whilst in use: https://bugs.openjdk.java.net/browse/JDK-8212207 Seems similar, yes. Wonder why we crash though. Should we not simply return EINVAL if the pthread id is invalid? It depends how long the id has been invalid. It seems that the threads library has short-term memory when it comes to using the pthreadt of a thread that recently terminated - the pthreadt->tid field is set to a negative value and it will cause ESRCH in that case. But if the pthreadt memory has been reclaimed and reused ... potential boom! Cheers, David ..Thomas Cheers, David Cheers, Thomas On Mon, Oct 15, 2018 at 9:58 PM David Holmes <david.holmes at oracle.com> wrote:
Hi Thomas, I'm not sure I see a reason for splitting. Just move recordstackbaseandsize after the essential initialization in threadnativeentry - probably just before the: loginfo(os, thread)("Thread is alive (tid: ... as long as it happens before the handshake to the creating thread it should fix the problem of being able to see the new thread before its stack is set. The early crash problem is improved even if not perfect and better handled in the error reporter IMHO - as it needs to be resilient anyway. Cheers, David On 16/10/2018 1:41 AM, Thomas Stüfe wrote: Hi David,
On Mon, Oct 15, 2018 at 7:32 AM David Holmes <david.holmes at oracle.com> wrote:
On 13/10/2018 5:49 PM, Thomas Stüfe wrote: On Sat, Oct 13, 2018 at 8:36 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
Hi David, On Sat, Oct 13, 2018 at 1:42 AM David Holmes <david.holmes at oracle.com> wrote:
Hi Thomas, On 13/10/2018 3:56 AM, Thomas Stüfe wrote: Hi David,
thank you for your brain cycles. On Fri, Oct 12, 2018 at 12:39 PM David Holmes <david.holmes at oracle.com> wrote:
Hi Thomas, On 12/10/2018 6:18 PM, Thomas Stüfe wrote: Hi all,
a small question. JVMStartThread calls new JavaThread() JavaThread::JavaThread() calls os::createthread() os::createthread() starts the new thread and waits for the handshake, then returns Back in JVMStartThread, we call JavaThread::prepare(), which adds the new thread to the Threads list. By that time, the new thread is already running, but how far it has gotten is unknown. Right - though the new thread can't enter run() until after it has been prepared and "started". There are two parts to the handshake: Parent thread New Thread start new Thread wait for new thread signal parent wait for parent prepare new thread "start" new thread signal new thread run()
Ah, I see. The new thread is taken off the leash only at the end of JVMStartThread(), when Thread::start() is called and the final part of the handshake is completed. The new thread's stack dimensions are set from within Thread::run() (for some reason, every child class does this on its own?) by calling Thread::recordstackbaseandsize(). So, after the handshake with its parent thread. Why? Good question. Undoubtedly historical. :) This means we have a little race: in the Threads list there may be threads which have been just created and Thread::run() did not yet get around to set the stack size. In tests I stumbled over this, very rarely, when iterating Threads to check the stack sizes. Hmmm. Threads that are still threadnew should really be invisible until more fully initialized. How exactly were you iterating the threads? This might be an oversight in the related APIs. This was because of a little unrelated test I wrote to accumulate the combined thread stack sizes. They did not add up and I was confused. I am now less confused: In JVMStartThread(): { grab thread list lock new JavaThread() - calls pthreadcreate, new thread starts and waits for handshake. stack base, size still NULL,0. JavaThread::prepare() - thread gets added to Threads list a } // relinquish threads lock ... ... Threads::start() - signal new thread to run b - new thread completes handshake, calls Thread::run(), and all implementations pretty much immediately set the stackbase/size. Between (a) and (b) another thread could grab the Threads lock, iterate the threads and would see the new thread with uninitialized base/size. To prove this I did a little test right there, in JVMStartThread, after (a): + { + MutexLocker mu(Threadslock); + MyThreadClosure tc; + Threads::threadsdo(&tc); + } the thread closure just prints the stack dimensions of the thrad. Sure enough, the new thread still has NULL/0 for stack base/size. In fact, Thread::stackbase() asserts because of this. So, could not another thread happen to do the same in this little time interval? There should be some rules about how thread closures gather their target threads based on the thread state. Part of that should (now that I think about it) filter out threads that are still "new" - but whether they do or not is a different matter. This may well just be an oversight. I think so too. Is there any reason why we could not just call recordstackbaseandsize() before calling Thread::run(), right at the start of the native entry function (threadnativeentry in the case of Linux)? Have you tried it? :) I can't immediately see why this can't happen in the threadnativeentry. It's possible there was once a dependency with something in thread preparation etc, but that's just speculation on my part. Funny, I wanted to test it and then it occurred to me that we do this all along already on AIX: At the entrace of threadnativeentry I set stack base and size (that was even my own code from the initial AIX port). Unfortunately I have lost the history for that change and do not know anymore why I did this. Since we never had problems on AIX I guess this is okay for other platforms too - as long os::currentstackbase() / currentstacksize() have no side effects and do not rely on anything but OS functionality. Okay so are you going to propose making such a change? Yes. I'll prepare a webrev. Yikes Thread::recordstackbaseandsize() is complex. That thing's name is a blatant lie, it does way more than that, especially on Solaris. To be fair the problem is in the naming of os::initializethread() and the fact that on Solaris it has also been used for additional thread initialization when only intended for fixing up issues with the primordial thread's stack. And it seems only an issue with Solaris as everywhere else it is a empty method (and I'm unclear how it ended up as an .cpp function!) Okay maybe I overreacted. The function name was probably well choosen some time in the past and then the function grew in scope. I would think you can move recordstackbaseandsize() in threadnativeentry; delete initializethread(Thread t) completely. Put the code that was in the Solaris initializethread into the Solaris naitvethreadentry after recordstackbaseandsize(). Replacing os::initializethread() with a Solaris-only local solution seems clean and very reasonable. I am not so sure of moving the call to Thread::recordstackbaseandsize() up to threadnativeentry, not in its current form. I worry about this part: #if INCLUDENMT // record thread's native stack, stack grows downward MemTracker::recordthreadstack(stackend(), stacksize()); #endif // INCLUDENMT logdebug(os, thread)("Thread " UINTXFORMAT " stack dimensions: " PTRFORMAT "-" PTRFORMAT " (" SIZEFORMAT "k).", os::currentthreadid(), p2i(stackbase() - stacksize()), p2i(stackbase()), stacksize()/1024); Both NMT recording and logging are probably sensitive to the current Thread not fully initialized or Thread::current() not set at all (depending on how early we want to set stackbase and size). So I still prefer to chop this function in two, move the stackbase and size part up to the start of nativethreadentry, and leaving the rest where it is, under a new name. I feel unsure about moving this up to the start of the native entry function, since it is not aware of its surrounding Thread class being only half initialized. On AIX, we just call "setstackbase" and "setstacksize" directly, and later, in shared code, run thru Thread::recordstackbaseandsize() like everyone else. So, we just pre-initialized base and size. I do not want to do this for all platforms, since this is ugly. I wonder whether a better solution would be to change Thread::recordstackbaseandsize(): 1 move the calls to setstackbase() and setstacksize() to the OS specific threadnativeentry(). We can call this right away in the newly born thread, since it relies on nothing else. 2 leave the rest in place and rename the function to something like "Thread::finishinitialization()" 3 move the call to Thread::finishinitialization() up out of the ::run() functions to threadnativeentry() - just before it calls Thread::run(). 4 Alternative: to (3), make Thread::run() a non-virtual public function - a place for os-generic and thread-class-generic common stuff to-be-run-before-run(). Add a second, protected method, virtual Thread::dorun() = 0, which is the real function child classes should override. BTW I remember now why I did move the stack base/size initialization on AIX up to the start of threadnativeentry. We had crashes in newly spawned child threads and no usable hs-err files since to print a stack trace you need to know the stack dimensions. Another reason to change this. Well ... a thread should be doing very little interesting in terms of crashes prior to calling run() and setting up the stack information, and there will always be somewhere it can crash before this is set up, so perhaps the error reporter also needs to be bit more cautious about assuming things about the thread initialization state and the ability to use the Thread API. I agree. I think on AIX we are now able to print a stack trace without knowing the stack dimensions from Thread. Best Regards, Thomas Cheers, David ..Thomas Cheers, Thomas Cheers, David Cheers, Thomas Am I missing something here? I guess we will find out :) Cheers, David Thanks, Thomas
- Previous message: Question about thread initialization
- Next message: Question about thread initialization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]