8065585: Change ShouldNotReachHere() to never return (original) (raw)
Mikael Gerdin mikael.gerdin at oracle.com
Fri Apr 17 14:55:22 UTC 2015
- Previous message: 8065585: Change ShouldNotReachHere() to never return
- Next message: 8065585: Change ShouldNotReachHere() to never return
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2015-04-17 14:52, Stefan Karlsson wrote:
On 2015-04-17 13:49, Mikael Gerdin wrote: On 2015-04-16 15:32, Stefan Karlsson wrote: On 2015-04-16 14:33, David Holmes wrote:
Hi Stefan,
trimming ... On 16/04/2015 10:07 PM, Stefan Karlsson wrote: On 2015-04-16 04:23, David Holmes wrote: Second, more important question: have you examined how this attribute affects the ability to walk the stack? We have already seen issues on some platforms where library functions, like abort(), have the noreturn attribute and as a result the call is optimized in a way that prevents the stack from being walked - see eg:
https://git.matricom.net/Firmware/bionic/commit/5f32207a3db0bea3ca1c7f4b2b563c11b895f276
though this: https://www.raspberrypi.org/forums/viewtopic.php?t=60540&p=451729 suggests that problem may have been addressed by the libc folk. But it still raises the question as to how our own noreturn functions will be handled and how they will affect stacktrace generation in hserr logs or via gdb. I added a call to fatal(...) in the GC code. I get correct stacktraces in gdb, but the stacktraces in the hserr files are broken with fastdebug and product builds: Which platforms? On Linux x86 and x8664. Stack: [0x00007f12518d2000,0x00007f12519d3000], sp=0x00007f12519d0eb0, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x11db44a] VMError::reportanddie()+0x1ba V [libjvm.so+0x7efb80] reportvmerror(char const*, int, char const*, char const*)+0x90 V [libjvm.so+0x7efc49] reportvmerrornoreturn(char const*, int, char const*, char const*)+0x9 V [libjvm.so+0x7efc63] V [libjvm.so+0xfd7937] V [libjvm.so+0xfeec51] ... So what is the plan: try to get hserr working again? Or file this under "well it seemed like a good idea"? ;-) I'm leaning towards "seemed like a good idea", unless someone has an easy fix for these problems. I've been looking a bit at this. It's not the stack trace per se that is broken, but the decoding of the function names is not working for some of the callers of the noreturn functions. I tried this with reportfatal using -XX:ErrorHandlerTest=5 and got the following: 0x7fb71ccd98d0 : push %rbp 0x7fb71ccd98d1 <reportfatal+1>: mov %rdx,%rcx 0x7fb71ccd98d4 <reportfatal+4>: lea 0x9b4b34(%rip),%rdx 0x7fb71ccd98db <reportfatal+11>: mov %rsp,%rbp 0x7fb71ccd98de <reportfatal+14>: callq 0x7fb71ccd98c0 0x7fb71ccd98e3: data16 data16 data16 nopw %cs:0x0(%rax,%rax,1) So the reportfatal frame has ...98e3 as its return address, but that is actually outside the function and this causes dladdr() to return NULL in dlisaddr and dlisname. The JVM then attempts to decode using Decoder::decode but I wasn't able to follow that code to understand why that fails. The same appears to happen for the caller of reportfatal (controlledcrash in my case) but there I can't explain why dladdr returns NULL values there. After these two functions the rest of the stack trace appears to be correctly decoded. One approach could be to attempt to inject a "nop" at the end of functions which call a "noreturn" function. This would hopefully make the instruction after the call to the noreturn function part of the caller and would make symbol decoding work. I found this mail thread: https://sourceware.org/bugzilla/showbug.cgi?id=6522 which blames the -fcross-jumping optimization. I recompiled hotspot with OPTCFLAGS/debug.o=-fno-crossjumping, and now I get correct stack traces with fastdebug on Linux 64 bits.
I did a more thorough investigation into this on a slowdebug build, and the reason for the symbols missing appears to be that after the JVM's ELF Decoder runs into an un-decodeable symbol because a return PC points to a nop in-between two symbols (because it's just called a noreturn function) the Decoder sets m_status to FileInvalid and refuses to decode any more symbols. If I comment out the code to set the fail status I get a fairly normal hs err stacktrace:
V [libjvm.so+0xf184c8] VMError::report(outputStream*)+0x133c V [libjvm.so+0xf19865] VMError::report_and_die()+0x411 V [libjvm.so+0x7876de] report_vm_error(char const*, int, char const*, char const*)+0xba V [libjvm.so+0x7877d7] report_vm_error_noreturn(char const*, int, char const*, char const*)+0x3d V [libjvm.so+0x78781b] report_should_not_call(char const*, int)+0x0 V [libjvm.so+0x92bfeb] V [libjvm.so+0x6e10ff] GenCollectorPolicy::mem_allocate_work(unsigned long, bool, bool*)+0x283 V [libjvm.so+0x92c049] GenCollectedHeap::mem_allocate(unsigned long, bool*)+0x5d V [libjvm.so+0x45dbe5] CollectedHeap::common_mem_allocate_noinit(KlassHandle, unsigned long, Thread*)+0x103 V [libjvm.so+0x45dda2] CollectedHeap::common_mem_allocate_init(KlassHandle, unsigned long, Thread*)+0x4e V [libjvm.so+0x45e034] CollectedHeap::array_allocate(KlassHandle, int, int, Thread*)+0xac V [libjvm.so+0xed2f04] TypeArrayKlass::allocate_common(int, bool, Thread*)+0xf0 V [libjvm.so+0x44ae3e] TypeArrayKlass::allocate(int, Thread*)+0x3e V [libjvm.so+0xcef2d5] oopFactory::new_typeArray(BasicType, int, Thread*)+0x55 V [libjvm.so+0x9c5aa9] InterpreterRuntime::newarray(JavaThread*, BasicType, int)+0x147 j alloc.AllocArrays.main([Ljava/lang/String;)V+237 v ~StubRoutines::call_stub V [libjvm.so+0x9df121] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x6b1 V [libjvm.so+0xd091d7] os::os_exception_wrapper(void ()(JavaValue, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x41 V [libjvm.so+0x9dea5a] JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x86 V [libjvm.so+0xa42306] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x200 V [libjvm.so+0xa5964a] jni_CallStaticVoidMethod+0x353 C [libjli.so+0x86ed] JavaMain+0x93c C [libpthread.so.0+0x80a5] start_thread+0xc5
One problem is the line V [libjvm.so+0x78781b] report_should_not_call(char const*, int)+0x0 I actually added a call to fatal(), but since fatal calls a noreturn function the return pc of that frame accidentally points to the first instruction in the next function, which happens to be report_should_not_call.
I wonder if this could be fixed by forcing gcc to empit a nop after the call to report_vm_error_noreturn in report_fatal and friends. asm volatile ("nop" : : :); appears to not be enough. GCC is very aggressive with noreturn, even with -O0.
/Mikael
StefanK
/Mikael
Thanks, StefanK
Cheers, David Thanks, StefanK
Thanks, David Thanks, StefanK
- Previous message: 8065585: Change ShouldNotReachHere() to never return
- Next message: 8065585: Change ShouldNotReachHere() to never return
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]