Review request for 5049299 (original) (raw)
Martin Buchholz martinrb at google.com
Tue Jun 23 00:45:59 UTC 2009
- Previous message: Review request for 5049299
- Next message: Review request for 5049299
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
clone-exec update:
I submitted the changes for this, but jtreg tests failed on 32-bit Linux (I had only tested on 64-bit Linux)
We disabled (but did not roll back) the use of clone to allow the TL integration to proceed.
(As I promised elsewhere...) I just filed a bug against upstream glibc demonstrating the problem with clone(CLONE_VM). You can see the small C program in my bug report below. Probably any discussion related just to the glibc bug can occur on the public glibc bugzilla at http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311
glibc maintainer Uli Drepper has already responded saying
"If you use clone() you're on your own."
so if we are going to fix it, we'll have to do it ourselves. Help from threading/kernel hackers appreciated.
Thanks much,
Martin
---------- Forwarded message ---------- From: martinrb at google dot com <sourceware-bugzilla at sourceware.org> Date: Mon, Jun 22, 2009 at 12:23 Subject: [Bug nptl/10311] New: clone(CLONE_VM) fails with pthread_getattr_np on i386 To: martinrb at google.com
I'm using clone() with flags CLONE_VM, but not CLONE_THREAD. (background: I'm trying to solve the ancient overcommit failure when spawning a small Unix process from a big process).
The act of calling clone appears to mess up the pthread library, but only on i386, not on x86_64, using glibc version 2.7 (The bugzilla Version drop-down does not allow one to specify 2.7; y'all should fix that)
Here's a shell transcript containing a program that demonstrates the problem, and shows that the problem does not occur when running in 64-bit mode on 64-bit Linux. (The problem also occurs when running in 32-bit mode on 32-bit Linux).
A program like this would be a fine addition to the glibc test suite.
$ set -x; for flag in -m32 -m64; do gcc $flag -lpthread ./clone_bug.c && ./a.out; done; cat clone_bug.c; uname -a; getconf GNU_LIBPTHREAD_VERSION; getconf GNU_LIBC_VERSION +zsh:1464> set -x +zsh:1464> flag=-m32 +zsh:1464> gcc -m32 -lpthread ./clone_bug.c +zsh:1464> ./a.out count=2, pthread_getattr_np failed with errno = "No such process" +zsh:1464> flag=-m64 +zsh:1464> gcc -m64 -lpthread ./clone_bug.c +zsh:1464> ./a.out +zsh:1464> cat clone_bug.c #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <stddef.h> #include <sys/types.h> #include <wait.h> #include <errno.h> #include <unistd.h> #include <pthread.h> #include <syscall.h> #include <sched.h>
static void debugPrint(char *format, ...) { FILE *tty = fopen("/dev/tty", "w"); va_list ap; va_start(ap, format); vfprintf(tty, format, ap); va_end(ap); fclose(tty); }
static void debugPids(void) { // debugPrint("getpid()=%d gettid()=%d, syscall(getpid)=%d pthread_self=%d\n", // getpid(), syscall(SYS_gettid), syscall(SYS_getpid), pthread_self()); static int count = 0; pthread_attr_t attr; int result; ++count; if ((result = pthread_getattr_np(pthread_self(), &attr)) != 0) debugPrint("count=%d, pthread_getattr_np failed with errno = "%s"\n", count, strerror(result)); }
static int childProcess(void *ignored) { _exit(0); // debugPrint("child\n"); // execve("/bin/true", NULL, NULL); // perror("execve"); }
// I'm sure there's a better way to do this, // but pthread_join ain't it - we can't trust it. volatile int done = 0;
void* run(void *x) { const int stack_size = 1024 * 1024; void *clone_stack = malloc(2 * stack_size); int status; debugPids(); int pid = clone(childProcess, clone_stack + stack_size, CLONE_VM | SIGCHLD, NULL); waitpid(pid, &status, 0); debugPids(); done = 1; pthread_exit(0); return NULL; }
int main(int argc, char *argv[]) { pthread_attr_t attr; pthread_t tid;
pthread_attr_init(&attr); pthread_create(&tid, &attr, (void* ()(void)) run, NULL); // pthread_join(tid, NULL); while (! done) ; } +zsh:1464> uname -a Linux spraggett.mtv.corp.google.com 2.6.24-gg23-generic #1 SMP Fri Jan 30 14:07:49 PST 2009 x86_64 GNU/Linux +zsh:1464> getconf GNU_LIBPTHREAD_VERSION NPTL 2.7 +zsh:1464> getconf GNU_LIBC_VERSION glibc 2.7
-- Summary: clone(CLONE_VM) fails with pthread_getattr_np on i386 Product: glibc Version: 2.8 Status: NEW Severity: normal Priority: P2 Component: nptl AssignedTo: drepper at redhat dot com ReportedBy: martinrb at google dot com CC: glibc-bugs at sources dot redhat dot com GCC host triplet: x86_64-unknown-linux-gnu
http://sourceware.org/bugzilla/show_bug.cgi?id=10311
Martin
On Thu, Jun 11, 2009 at 14:16, Martin Buchholz <martinrb at google.com> wrote:
Thanks, Michael
I'm hoping the following will placate sun studio cc: diff --git a/src/solaris/native/java/lang/UNIXProcessmd.c b/src/solaris/native/java/lang/UNIXProcessmd.c --- a/src/solaris/native/java/lang/UNIXProcessmd.c +++ b/src/solaris/native/java/lang/UNIXProcessmd.c @@ -651,6 +651,7 @@ } close(FAILFILENO); exit(-1); + return 0; /* Suppress warning "no return value from function" */ } I'm also adding my manual test case BigFork.java. It may be helpful while implementing the Solaris version of this feature. webrev updated. I need a Sun bug to commit these changes for Linux. Please create one. Synopsis: * (process) Use clone(CLONEVM), not fork, on Linux to avoid swap exhaustion <http://bugs.sun.com/viewbug.do?bugid=5049299>* Description: On Linux it is possible to use clone with CLONEVM, but not CLONETHREAD, which is like fork() but much cheaper and avoids swap exhaustion due to momentary overcommit of swap space. One has to be very careful in this case to not mutate global variables such as environ, but it's worth it. Evaluation: Make it so. See also: 5049299 Once that is done, I will commit my changes. Thanks, Martin
On Thu, Jun 11, 2009 at 07:22, Michael McMahon <Michael.McMahon at sun.com>wrote: Martin Buchholz wrote:
I broke down and finally created a "proper" webrev, just like the good old days.
http://cr.openjdk.java.net/~martin/clone-exec/<http://cr.openjdk.java.net/%7Emartin/clone-exec/><_ _http://cr.openjdk.java.net/%7Emartin/clone-exec/> I've run the regression tests on Solaris and Linux and they seem fine. There is a compile warning on solaris at line 654: no return value from function. Aside from that, I'm happy with the change now - Michael. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090622/d1688712/attachment.html>
- Previous message: Review request for 5049299
- Next message: Review request for 5049299
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]