original) (raw)
(clone-exec update:
I submitted the changes for this, but jtreg tests failed on 32-bit Linux
(I had only tested on 64-bit Linux)
We disabled (but did not roll back) the use of clone to allow the
TL integration to proceed.
(As I promised elsewhere...)
I just filed a bug against upstream glibc demonstrating the problem
with clone(CLONE_VM).� You can see the small C program
in my bug report below.
Probably any discussion related just to the glibc bug
can occur on the public glibc bugzilla at
http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311
glibc maintainer Uli Drepper has already responded
saying
"If you use clone() you're on your own."
so if we are going to fix it, we'll have to do it ourselves.
Help from threading/kernel hackers appreciated.
Thanks much,
Martin
From: martinrb at google dot com <sourceware-bugzilla@sourceware.org>
Date: Mon, Jun 22, 2009 at 12:23
Subject: [Bug nptl/10311] New: clone(CLONE_VM) fails with pthread_getattr_np on i386
To: martinrb@google.com
I'm using clone() with flags CLONE_VM, but not CLONE_THREAD.
(background: I'm trying to solve the ancient overcommit failure
when spawning a small Unix process from a big process).
The act of calling clone appears to mess up the pthread library,
but only on i386, not on x86_64, using glibc version 2.7
(The bugzilla Version drop-down does not allow one to specify 2.7;
y'all should fix that)
Here's a shell transcript containing a program
that demonstrates the problem, and shows that
the problem does not occur when running in 64-bit mode
on 64-bit Linux. �(The problem also occurs when running in 32-bit mode
on 32-bit Linux).
A program like this would be a fine addition to the glibc test suite.
$ set -x; for flag in -m32 -m64; do gcc $flag -lpthread ./clone_bug.c &&
./a.out; done; cat clone_bug.c; uname -a; getconf GNU_LIBPTHREAD_VERSION;
getconf GNU_LIBC_VERSION
+zsh:1464> set -x
+zsh:1464> flag=-m32
+zsh:1464> gcc -m32 -lpthread ./clone_bug.c
+zsh:1464> ./a.out
count=2, pthread_getattr_np failed with errno = "No such process"
+zsh:1464> flag=-m64
+zsh:1464> gcc -m64 -lpthread ./clone_bug.c
+zsh:1464> ./a.out
+zsh:1464> cat clone_bug.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <stddef.h>
#include <sys/types.h>
#include <wait.h>
#include <errno.h>
#include <unistd.h>
#include <pthread.h>
#include <syscall.h>
#include <sched.h>
static void
debugPrint(char *format, ...) {
�FILE *tty = fopen("/dev/tty", "w");
�va_list ap;
�va_start(ap, format);
�vfprintf(tty, format, ap);
�va_end(ap);
�fclose(tty);
}
static void debugPids(void) {
// � debugPrint("getpid()=%d gettid()=%d, syscall(getpid)=%d pthread_self=%d\n",
// � � � � � � �getpid(), syscall(SYS_gettid), syscall(SYS_getpid), pthread_self());
�static int count = 0;
�pthread_attr_t attr;
�int result;
�++count;
�if ((result = pthread_getattr_np(pthread_self(), &attr)) != 0)
� �debugPrint("count=%d, pthread_getattr_np failed with errno = \"%s\"\n",
� � � � � � � count, strerror(result));
}
static int childProcess(void *ignored) {
�_exit(0);
�// debugPrint("child\n");
�// execve("/bin/true", NULL, NULL);
�// perror("execve");
}
// I'm sure there's a better way to do this,
// but pthread_join ain't it - we can't trust it.
volatile int done = 0;
void* run(void *x) {
�const int stack_size = 1024 * 1024;
�void *clone_stack = malloc(2 * stack_size);
�int status;
�debugPids();
�int pid = clone(childProcess, clone_stack + stack_size,
� � � � � � � � �CLONE_VM | SIGCHLD, NULL);
�waitpid(pid, &status, 0);
�debugPids();
�done = 1;
�pthread_exit(0);
�return NULL;
}
int main(int argc, char *argv[]) {
�pthread_attr_t attr;
�pthread_t tid;
�pthread_attr_init(&attr);
�pthread_create(&tid, &attr, (void* (*)(void*)) run, NULL);
�// pthread_join(tid, NULL);
�while (! done)
� �;
}
+zsh:1464> uname -a
Linux spraggett.mtv.corp.google.com 2.6.24-gg23-generic #1 SMP Fri Jan 30
14:07:49 PST 2009 x86_64 GNU/Linux
+zsh:1464> getconf GNU_LIBPTHREAD_VERSION
NPTL 2.7
+zsh:1464> getconf GNU_LIBC_VERSION
glibc 2.7
--
� � � � � Summary: clone(CLONE_VM) fails with pthread_getattr_np on i386
� � � � � Product: glibc
� � � � � Version: 2.8
� � � � � �Status: NEW
� � � � �Severity: normal
� � � � �Priority: P2
� � � � Component: nptl
� � � �AssignedTo: drepper at redhat dot com
� � � �ReportedBy: martinrb at google dot com
� � � � � � � �CC: glibc-bugs at sources dot redhat dot com
�GCC host triplet: x86_64-unknown-linux-gnu
http://sourceware.org/bugzilla/show_bug.cgi?id=10311
Martin
Thanks, Michael
I'm hoping the following will placate sun studio cc:
diff --git a/src/solaris/native/java/lang/UNIXProcess\_md.c b/src/solaris/native/java/lang/UNIXProcess\_md.c
\--- a/src/solaris/native/java/lang/UNIXProcess\_md.c
+++ b/src/solaris/native/java/lang/UNIXProcess\_md.c
@@ -651,6 +651,7 @@
���� }
���� close(FAIL\_FILENO);
���� \_exit(-1);
+��� return 0;� /\* Suppress warning "no return value from function" \*/
�}
�
I'm also adding my manual test case BigFork.java.
It may be helpful while implementing the Solaris version of this feature.
webrev updated.
I need a Sun bug to commit these changes for Linux.� Please create one.
Synopsis: (process) Use clone(CLONE_VM), not fork, on Linux to avoid swap exhaustion
Description:
On Linux it is possible to use clone with CLONE_VM, but not CLONE_THREAD,
which is like fork() but much cheaper and avoids swap exhaustion due to momentary
overcommit of swap space.� One has to be very careful in this case to not mutate globalvariables such as environ, but it's worth it.
Evaluation:
Make it so.
See also: 5049299
Once that is done, I will commit my changes.
Thanks,
Martin
On Thu, Jun 11, 2009 at 07:22, Michael McMahon <Michael.McMahon@sun.com> wrote:Martin Buchholz wrote:
I've run the regression tests on Solaris and Linux and they seem fine.I broke down and finally created a "proper" webrev,�http://cr.openjdk.java.net/\~martin/clone-exec/ <http://cr.openjdk.java.net/%7Emartin/clone-exec/>
just like the good old days.
There is a compile warning on solaris at line 654: no return value from function.
Aside from that, I'm happy with the change now
\- Michael.