http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311

glibc maintainer Uli Drepper has already responded
saying  "If you use clone() you're on your own." so if we are going to fix it, we'll have to do it ourselves.
Help from threading/kernel hackers appreciated. 

Thanks much,

Martin

---------- Forwarded message ----------
From: martinrb at google dot com <sourceware-bugzilla@sourceware.org>
">

(original) (raw)

clone-exec update:

I submitted the changes for this, but jtreg tests failed on 32-bit Linux
(I had only tested on 64-bit Linux)

We disabled (but did not roll back) the use of clone to allow the
TL integration to proceed.


(As I promised elsewhere...)

I just filed a bug against upstream glibc demonstrating the problem

with clone(CLONE_VM).� You can see the small C program

in my bug report below.

Probably any discussion related just to the glibc bug
can occur on the public glibc bugzilla at

http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311



glibc maintainer Uli Drepper has already responded

saying

"If you use clone() you're on your own."

so if we are going to fix it, we'll have to do it ourselves.

Help from threading/kernel hackers appreciated.



Thanks much,



Martin


\---------- Forwarded message ----------
From: martinrb at google dot com <sourceware-bugzilla@sourceware.org>

Date: Mon, Jun 22, 2009 at 12:23
Subject: [Bug nptl/10311] New: clone(CLONE_VM) fails with pthread_getattr_np on i386
To: martinrb@google.com


I'm using clone() with flags CLONE_VM, but not CLONE_THREAD.

(background: I'm trying to solve the ancient overcommit failure

when spawning a small Unix process from a big process).



The act of calling clone appears to mess up the pthread library,

but only on i386, not on x86_64, using glibc version 2.7

(The bugzilla Version drop-down does not allow one to specify 2.7;

y'all should fix that)



Here's a shell transcript containing a program

that demonstrates the problem, and shows that

the problem does not occur when running in 64-bit mode

on 64-bit Linux. �(The problem also occurs when running in 32-bit mode

on 32-bit Linux).



A program like this would be a fine addition to the glibc test suite.



$ set -x; for flag in -m32 -m64; do gcc $flag -lpthread ./clone_bug.c &&

./a.out; done; cat clone_bug.c; uname -a; getconf GNU_LIBPTHREAD_VERSION;

getconf GNU_LIBC_VERSION

+zsh:1464> set -x

+zsh:1464> flag=-m32

+zsh:1464> gcc -m32 -lpthread ./clone_bug.c

+zsh:1464> ./a.out

count=2, pthread_getattr_np failed with errno = "No such process"

+zsh:1464> flag=-m64

+zsh:1464> gcc -m64 -lpthread ./clone_bug.c

+zsh:1464> ./a.out

+zsh:1464> cat clone_bug.c

#include <stdio.h>

#include <stdlib.h>

#include <stdarg.h>

#include <stddef.h>

#include <sys/types.h>

#include <wait.h>

#include <errno.h>

#include <unistd.h>

#include <pthread.h>

#include <syscall.h>

#include <sched.h>



static void

debugPrint(char *format, ...) {

�FILE *tty = fopen("/dev/tty", "w");

�va_list ap;

�va_start(ap, format);

�vfprintf(tty, format, ap);

�va_end(ap);

�fclose(tty);

}



static void debugPids(void) {

// � debugPrint("getpid()=%d gettid()=%d, syscall(getpid)=%d pthread_self=%d\n",

// � � � � � � �getpid(), syscall(SYS_gettid), syscall(SYS_getpid), pthread_self());

�static int count = 0;

�pthread_attr_t attr;

�int result;

�++count;

�if ((result = pthread_getattr_np(pthread_self(), &attr)) != 0)

� �debugPrint("count=%d, pthread_getattr_np failed with errno = \"%s\"\n",

� � � � � � � count, strerror(result));

}



static int childProcess(void *ignored) {

�_exit(0);

�// debugPrint("child\n");

�// execve("/bin/true", NULL, NULL);

�// perror("execve");

}



// I'm sure there's a better way to do this,

// but pthread_join ain't it - we can't trust it.

volatile int done = 0;



void* run(void *x) {

�const int stack_size = 1024 * 1024;

�void *clone_stack = malloc(2 * stack_size);

�int status;

�debugPids();

�int pid = clone(childProcess, clone_stack + stack_size,

� � � � � � � � �CLONE_VM | SIGCHLD, NULL);

�waitpid(pid, &status, 0);

�debugPids();

�done = 1;

�pthread_exit(0);

�return NULL;

}



int main(int argc, char *argv[]) {

�pthread_attr_t attr;

�pthread_t tid;



�pthread_attr_init(&attr);

�pthread_create(&tid, &attr, (void* (*)(void*)) run, NULL);

�// pthread_join(tid, NULL);

�while (! done)

� �;

}

+zsh:1464> uname -a

Linux spraggett.mtv.corp.google.com 2.6.24-gg23-generic #1 SMP Fri Jan 30

14:07:49 PST 2009 x86_64 GNU/Linux

+zsh:1464> getconf GNU_LIBPTHREAD_VERSION

NPTL 2.7

+zsh:1464> getconf GNU_LIBC_VERSION

glibc 2.7



--

� � � � � Summary: clone(CLONE_VM) fails with pthread_getattr_np on i386

� � � � � Product: glibc

� � � � � Version: 2.8

� � � � � �Status: NEW

� � � � �Severity: normal

� � � � �Priority: P2

� � � � Component: nptl

� � � �AssignedTo: drepper at redhat dot com

� � � �ReportedBy: martinrb at google dot com

� � � � � � � �CC: glibc-bugs at sources dot redhat dot com

�GCC host triplet: x86_64-unknown-linux-gnu





http://sourceware.org/bugzilla/show_bug.cgi?id=10311


Martin


On Thu, Jun 11, 2009 at 14:16, Martin Buchholz <martinrb@google.com> wrote:
Thanks, Michael

I'm hoping the following will placate sun studio cc:

diff --git a/src/solaris/native/java/lang/UNIXProcess\_md.c b/src/solaris/native/java/lang/UNIXProcess\_md.c
\--- a/src/solaris/native/java/lang/UNIXProcess\_md.c
+++ b/src/solaris/native/java/lang/UNIXProcess\_md.c
@@ -651,6 +651,7 @@
���� }
���� close(FAIL\_FILENO);
���� \_exit(-1);
+��� return 0;� /\* Suppress warning "no return value from function" \*/
�}


I'm also adding my manual test case BigFork.java.
It may be helpful while implementing the Solaris version of this feature.

webrev updated.

I need a Sun bug to commit these changes for Linux.� Please create one.


Synopsis: (process) Use clone(CLONE_VM), not fork, on Linux to avoid swap exhaustion


Description:
On Linux it is possible to use clone with CLONE_VM, but not CLONE_THREAD,
which is like fork() but much cheaper and avoids swap exhaustion due to momentary
overcommit of swap space.� One has to be very careful in this case to not mutate global

variables such as environ, but it's worth it.

Evaluation:
Make it so.

See also: 5049299

Once that is done, I will commit my changes.

Thanks,


Martin




On Thu, Jun 11, 2009 at 07:22, Michael McMahon <Michael.McMahon@sun.com> wrote:

Martin Buchholz wrote:

I broke down and finally created a "proper" webrev,
just like the good old days.

http://cr.openjdk.java.net/\~martin/clone-exec/ <http://cr.openjdk.java.net/%7Emartin/clone-exec/>

I've run the regression tests on Solaris and Linux and they seem fine.
There is a compile warning on solaris at line 654: no return value from function.
Aside from that, I'm happy with the change now

\- Michael.