RFR: 8176717: GC log file handle leaked to child processes (original) (raw)

Leo Korinth leo.korinth at oracle.com
Mon Mar 12 19:40:30 UTC 2018


On 12/03/18 17:48, Thomas Stüfe wrote:

Hi Leo,

On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com_ _<mailto:leo.korinth at oracle.com>> wrote:

On 12/03/18 15:29, Thomas Stüfe wrote: Hi Leo, This seems weird. This would affect numerous open() calls, not just this GC log, I cannot imagine the correct fix is to change all of them. Sorry, I do not understand what you mean with "numerous open()". This fix will only affect logging -- or am I missing something? os::open does roughly what I try to do in os::fopenretain. Sorry, I spoke unclear. What I meant was I would expect the problem you found in gc logging to be present for every raw open()/fopen()/CreateFile() call in the VM and in the JDK, which are quite a few. I wondered why we do not see more problems like this.

Oh, now I see, I just assumed os::open was used everywhere when in fact it is only used in two places where the file in addition seems to be closed fast afterwards. I should assume less...

I guess leaking file descriptors is not that big of a problem. It seems to have been a problem on Solaris (which seems to be the reason for os::open) but as the unix file descriptors are closed by core-libs before exec is mostly a windows problem.

On windows it is also more of a problem because open files are harder to rename or remove, and therefore the bug report.

In fact, on Posix platforms we close all file descriptors except the Pipe ones before between fork() and exec() - see unix/native/libjava/ childproc.c. Yes, that is why my test case did not fail before the fix on unix-like systems. I do not know why it is not handled in Windows (possibly a bug, possibly to keep old behaviour???), I had planned to ask that as a follow up question later, maybe open a bug report if it was not for keeping old behaviour. Even though childproc.c does close the file handler, I think it is much nicer to open them with FDCLOEXEC (in addition to let childproc.c close it). os::open does so, and I would like to handle ::fopen the same way as ::open with a proxy call that ensures that the VM process will retain the file descriptor it opens (in HotSpot at least). Such code is missing on Windows - see windows/native/libjava/ProcessImplmd.c  . There, we do not have fork/exec, but CreateProcess(), and whether we inherit handles or not is controlled via an argument to CreateProcess(). But that flag is TRUE, so child processes inherit handles. 331                    if (!CreateProcessW( 332                        NULL,             /* executable name */ 333                        (LPWSTR)pcmd,     /* command line */ 334                        NULL,             /* process security attribute */ 335                        NULL,             /* thread security attribute */ 336                        TRUE,             /* inherits system handles */          <<<<<<_ _337                        processFlag,      /* selected based_ _on exe type */_ _338                        (LPVOID)penvBlock,/* environment block */_ _339                        (LPCWSTR)pdir,    /* change to the_ _new current directory */_ _340                        &si,              /* (in)  startup_ _information */_ _341                        &pi))             /* (out) process_ _information */_ _342                    {_ _343                        win32Error(env, L"CreateProcess");_ _344                    }_ _Maybe this is the real error we should fix? Make Windows_ _Runtime.exec behave like the Posix variant by closing all file_ _descriptors upon CreateProcess > (This seems more of a core-libs question.) I think it is both a core-libs question and a hotspot question. I firmly believe we should retain file descriptors with help of FDCLOEXEC and its variants in HotSpot. I am unsure (and have no opinion) what to do in core-libs, maybe there is a deeper thought behind line 336? Some reasons for this: - if a process is forked using JNI, it would still be good if the hotspot descriptors would not leak. - if (I have no idea if this is true) the behaviour in core-libs can not be changed because the behaviour is already wildly (ab)used, this is still a correct fix. Remember this will only close file descriptors opened by HotSpot code, and at the moment only logging code. - this will fix the issue in the bug report, and give time for core-libs to consider what is correct (and what can be changed without breaking applications). Thanks, Leo yes, you convinced me. 1 We should fix raw open() calls, because if native code forks via a different code paths than java Runtime.exec(), we run into the same problem. Your patch fixes one instance of the problem. Yes, I agree. I now understand that ::open() is much more used in the code base. 2 And we should fix Windows Runtime.exec() to the same behaviour as on Posix. I can see this being backward-compatible-problematic, but it certainly would be the right thing to do. Would love to know what core-libs says.

Possibly (I am intentionally dodging this question)

Okay, about your change: I dislike that we add a new function, especially a first class open function, to the os namespace. How about this instead: since we know that os::open() does the right thing on all platforms, why can we not just use os::open() instead? Afterwards call fdopen() to wrap a FILE structure around it, respectively call "FILE* os::open(int fd, const char* mode)" , which seems to be just a wrapped fdopen(). That way you can get what you want with less change and without introducing a new API.

Yes, that might be a better solution. I did consider it, but was afraid that, for example the (significant) "w"/"w+" differences in semantics would matter. or that:

os::open(os::open(path, WINDOWS_ONLY()O_CREAT|WINDOWS_ONLY()O_TRUNC, flags, mode) mode2)

...or something similar for fopen(path, "w"), would not be exactly the same. For example it would set the file to binary mode on windows. Maybe it is exactly the same otherwise? For me, the equality in semantics are not obvious.

Also, now when I realized there is only two users of os::open I am less sure it always does the right thing...

I prefer the os::fopen_retain way.

Thanks, Leo

Kind Regards, Thomas

Kind Regards, Thomas On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com> <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>> wrote: Hi, This fix is for all operating systems though the problem only seams to appear on windows. I am creating a proxy function for fopen (os::fopenretain) that appends the non-standard "e" mode for linux and bsds. For windows the "N" mode is used. For other operating systems, I assume that I can use fcntl FSETFD FDCLOEXEC. I think this will work for AIX, Solaris and other operating systems that do not support the "e" flag. Feedback otherwise please! The reason that I use the mode "e" and not only fcntl for linux and bsds is threefold. First, I still need to use mode flags on windows as it does not support fcntl. Second, I probably save a system call. Third, the change will be applied directly, and there will be no point in time (between system calls) when the process can leak the file descriptor, so it is safer. The test case forks three VMs in a row. By doing so we know that the second VM is opened with a specific log file. The third VM should have less open file descriptors (as it is does not use logging) which is checked using a UnixOperatingSystemMXBean. This is not possible on windows, so on windows I try to rename the file, which will not work if the file is opened (the actual reason the bug was opened). The added test case shows that the bug fix closes the log file on windows. The VM on other operating systems closed the log file even before the fix. Maybe the test case should be moved to a different path? Bug: https://bugs.openjdk.java.net/browse/JDK-8176717 <https://bugs.openjdk.java.net/browse/JDK-8176717> <https://bugs.openjdk.java.net/browse/JDK-8176717_ _<https://bugs.openjdk.java.net/browse/JDK-8176717>> https://bugs.openjdk.java.net/browse/JDK-8176809 <https://bugs.openjdk.java.net/browse/JDK-8176809> <https://bugs.openjdk.java.net/browse/JDK-8176809_ _<https://bugs.openjdk.java.net/browse/JDK-8176809>> Webrev: http://cr.openjdk.java.net/~lkorinth/8176717/00/ <http://cr.openjdk.java.net/~lkorinth/8176717/00/> <http://cr.openjdk.java.net/~lkorinth/8176717/00/_ _<http://cr.openjdk.java.net/~lkorinth/8176717/00/>> Testing: hs-tier1, hs-tier2 and TestInheritFD.java (on 64-bit linux, solaris, windows and mac) Thanks, Leo



More information about the hotspot-dev mailing list