#14752 - sort fails to fork() + execlp(compress_program) if overcommit limit is reached (original) (raw)
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 14752 in the body.
You can then email your comments to 14752 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Sun, 30 Jun 2013 05:23:02 GMT) Full text and rfc822 format available.
Acknowledgement sentto Petros Aggelatos <petrosagg <at> gmail.com>
:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org
. (Sun, 30 Jun 2013 05:23:02 GMT) Full text and rfc822 format available.
Message #5 received at submit debbugs.gnu.org (full text, mbox):
I was trying to sort a big file (22GB, 5GB gzipped) with sort --compress-program=gzip -S40% data
. My /tmp filesystem is a 6GB tmpfs
and the total system RAM is 16GB. The problem was that after a while
sort would write uncompressed temp files in /tmp causing it to fill up
and then crash for having no free space.
After messing around with sort.c I found out that fork() would fail
after the first batches were written successfully in /tmp with errno
ENOMEM. I found out that I was hitting my kernel's VM overcommit limit
as the tmpfs utilisation grew. And indeed, running sysctl -w vm.overcommit=1
fixed the problem and the file was sorted.
As sort is a program with high memory consumption and could be running in environments where overcommit is either unavailable or disabled I would expect it to use another method for spawning its compress_program. I'm not a C developer, so I'm not sure this is possible, but vfork() and clone() that seem to solve this problem.
-- Petros Angelatos
Reply sentto Bernhard Voelker <mail <at> bernhard-voelker.de>
:
You have taken responsibility. (Mon, 01 Jul 2013 07:17:02 GMT) Full text and rfc822 format available.
Notification sentto Petros Aggelatos <petrosagg <at> gmail.com>
:
bug acknowledged by developer. (Mon, 01 Jul 2013 07:17:03 GMT) Full text and rfc822 format available.
Message #10 received at 14752-done debbugs.gnu.org (full text, mbox):
tag 14752 notabug close 14752 stop
On 06/30/2013 03:42 AM, Petros Aggelatos wrote:
I was trying to sort a big file (22GB, 5GB gzipped) with
sort --compress-program=gzip -S40% data
. My /tmp filesystem is a 6GB tmpfs and the total system RAM is 16GB. The problem was that after a while sort would write uncompressed temp files in /tmp causing it to fill up and then crash for having no free space.
Thanks for reporting this. However, I think that your system's memory is just too small for sorting that file (that way, see below).
You already recognized yourself that sort(1) was writing huge chunk files into the /tmp directory which is a tmpfs file system, i.e., that all that data is decreasing the memory available for running processes. The overhead for spawning a new process is negligible compared to such an amount of data.
In such a case, you're much better off telling sort(1) to use a different directory for the temporary files.
Here's an excerpt from the texinfo manual (info coreutils 'sort invocation'):
If the environment variable `TMPDIR' is set, `sort' uses its value
as the directory for temporary files instead of /tmp'. The
--temporary-directory' (`-T') option in turn overrides the environment
variable.
...
-T TEMPDIR'
--temporary-directory=TEMPDIR'
Use directory TEMPDIR to store temporary files, overriding the
`TMPDIR' environment variable. If this option is given more than
once, temporary files are stored in all the directories given. If
you have a large sort or merge that is I/O-bound, you can often
improve performance by using this option to specify directories on
different disks and controllers.
Have a nice day, Berny
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 01 Jul 2013 09:53:02 GMT) Full text and rfc822 format available.
Message #13 received at 14752 debbugs.gnu.org (full text, mbox):
On 07/01/2013 08:16 AM, Bernhard Voelker wrote:
tag 14752 notabug close 14752 stop
On 06/30/2013 03:42 AM, Petros Aggelatos wrote:
I was trying to sort a big file (22GB, 5GB gzipped) with
sort --compress-program=gzip -S40% data
. My /tmp filesystem is a 6GB tmpfs and the total system RAM is 16GB. The problem was that after a while sort would write uncompressed temp files in /tmp causing it to fill up and then crash for having no free space.Thanks for reporting this. However, I think that your system's memory is just too small for sorting that file (that way, see below).
I'm not so sure there is nothing we might change here. Petros said that the sort succeeded when overcommit was set to "always overcommit". Some notes on fork() overcommit behavior: http://lkml.indiana.edu/hypermail/linux/kernel/0902.1/02389.html
Petros, for completeness, what kernel were you using, and was SELinux enabled?
You already recognized yourself that sort(1) was writing huge chunk files into the /tmp directory which is a tmpfs file system, i.e., that all that data is decreasing the memory available for running processes. The overhead for spawning a new process is negligible compared to such an amount of data.
In such a case, you're much better off telling sort(1) to use a different directory for the temporary files.
Here's an excerpt from the texinfo manual (info coreutils 'sort invocation'):
If the environment variable `TMPDIR' is set, `sort' uses its value
as the directory for temporary files instead of
/tmp'. The
--temporary-directory' (`-T') option in turn overrides the environment variable....
-T TEMPDIR'
--temporary-directory=TEMPDIR' Use directory TEMPDIR to store temporary files, overriding the `TMPDIR' environment variable. If this option is given more than once, temporary files are stored in all the directories given. If you have a large sort or merge that is I/O-bound, you can often improve performance by using this option to specify directories on different disks and controllers.
Note tmpfs is backed by RAM and swap, so depending on the swapiness settings for the kernel, it will auto migrate to the swap device(s) under RAM pressure.
BTW, -S40% may be the root of the issue. Petros have you tried smaller buffers, which would probably avoid the issue on fork(), but also may take advantage of cache locality. I.E. sort currently takes advantage of a 2 level memory hierarchy by allocating large RAM buffers by default, and assuming /tmp is the next storage level down. By reducing the mem buffers down to take advantage of ever increasing cache sizes further up the hierarchy, may increase performance while avoiding the fork() issue. I previously made some notes on this here: http://lists.gnu.org/archive/html/bug-coreutils/2010-03/msg00008.html
vfork() might be an option here. One can't rely on it being different to fork(), and it blocks the parent until the exec() in the child, and there are various restrictions on the child, but that might be fine? But I think posix_spawn() is the new standardised equivalent, and I notice the spawn-pipe gnulib module which might be leverged here?
cheers, Pádraig.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 01 Jul 2013 10:34:02 GMT) Full text and rfc822 format available.
Message #16 received at 14752 debbugs.gnu.org (full text, mbox):
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 07/01/2013 12:52 PM, Pádraig Brady wrote:
On 07/01/2013 08:16 AM, Bernhard Voelker wrote: Petros, for completeness, what kernel were you using, and was SELinux enabled? I'm running kernel 3.9.7-1-ARCH and SELinux is disabled.
Note tmpfs is backed by RAM and swap, so depending on the swapiness settings for the kernel, it will auto migrate to the swap device(s) under RAM pressure. I have swap disabled on my system as 16GB or RAM seem more than enough. The problem is irrelevant whether it is using tmpfs or not, it just happened to trigger the error on my machine as it grew. But one can reproduce by disabling overcommit and running a sort with compression and -S60%.
BTW, -S40% may be the root of the issue. Petros have you tried smaller buffers, which would probably avoid the issue on fork(), but also may take advantage of cache locality. I tried using smaller buffers but I needed more temp space which I didn't have. If the initial cutting of the file results in more than NMERGE chunks then in the worst case it will need: N + NMERGE * P * N / (NMERGE + 1) where N is the input size and P is the number of merge-sorts running in parallel. But by using -S40% the input file would be split in less than 16 chunks so it would be just a single 16-way merge and require just N size of temp space.
vfork() might be an option here. One can't rely on it being different to fork(), and it blocks the parent until the exec() in the child, and there are various restrictions on the child, but that might be fine? But I think posix_spawn() is the new standardised equivalent, and I notice the spawn-pipe gnulib module which might be leverged here? I did some further research on this after the initial report and posix_spawn() seems to me that is the standard way too. I could write a patch and test it.
Regards, Petros -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJR0VrnAAoJEHebBSKmldDLOasH/3iSJzO2AMU2LxKPKviSuMTz 5519trFD83e38C9I0o1ta830hofPXcpjoEHcinbc5VXibcLyvpfigSrcXoJa9FDZ pHvdOVJELFq4Rd4bXfDoVGoFZ79oNAUAZyR/tg2lapL0TKAiLF6hLOdG2WTtbdmM A1bOFIfjOWjC3Ro2aN7IPs4n1QTPYwLlmKCc4zCRWyl2B3m6EeiLN+qdKWNLpf/3 g8XoGv4yHRQENzk2s2o1aBcPyJ9FLtyQy/HlheNHThc9k+ReJe8JJ89h2UXhFKiT xgE62oNpPi3PL0w8SGuziTQgIKOgRbalQEZafUbeW2HB0ZiBKjZiY9o2PWh3UzM= =J1rs -----END PGP SIGNATURE-----
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 01 Jul 2013 11:15:03 GMT) Full text and rfc822 format available.
Message #19 received at 14752 debbugs.gnu.org (full text, mbox):
On 07/01/2013 11:33 AM, Petros Aggelatos wrote:
On 07/01/2013 12:52 PM, Pádraig Brady wrote:
On 07/01/2013 08:16 AM, Bernhard Voelker wrote: Petros, for completeness, what kernel were you using, and was SELinux enabled? I'm running kernel 3.9.7-1-ARCH and SELinux is disabled.
Note tmpfs is backed by RAM and swap, so depending on the swapiness settings for the kernel, it will auto migrate to the swap device(s) under RAM pressure. I have swap disabled on my system as 16GB or RAM seem more than enough. The problem is irrelevant whether it is using tmpfs or not, it just happened to trigger the error on my machine as it grew. But one can reproduce by disabling overcommit and running a sort with compression and -S60%.
BTW, -S40% may be the root of the issue. Petros have you tried smaller buffers, which would probably avoid the issue on fork(), but also may take advantage of cache locality. I tried using smaller buffers but I needed more temp space which I didn't have. If the initial cutting of the file results in more than NMERGE chunks then in the worst case it will need: N + NMERGE * P * N / (NMERGE + 1) where N is the input size and P is the number of merge-sorts running in parallel. But by using -S40% the input file would be split in less than 16 chunks so it would be just a single 16-way merge and require just N size of temp space.
vfork() might be an option here. One can't rely on it being different to fork(), and it blocks the parent until the exec() in the child, and there are various restrictions on the child, but that might be fine? But I think posix_spawn() is the new standardised equivalent, and I notice the spawn-pipe gnulib module which might be leverged here? I did some further research on this after the initial report and posix_spawn() seems to me that is the standard way too. I could write a patch and test it.
Patches would be gladly accepted!
The higher level spawn-pipe module might not be appropriate, requiring use of posix_spawn() directly. In any case I notice that both are already incorporated in coreutils, through:
posix-shell posix_spawn-internal posix_spawn_file_actions_addclose posix_spawn_file_actions_addclose-tests posix_spawn_file_actions_adddup2 posix_spawn_file_actions_adddup2-tests posix_spawn_file_actions_addopen posix_spawn_file_actions_addopen-tests posix_spawn_file_actions_destroy posix_spawn_file_actions_init posix_spawnattr_destroy posix_spawnattr_init posix_spawnattr_setflags posix_spawnattr_setsigmask posix_spawnp posix_spawnp-tests sigaction spawn spawn-pipe
As a side note I notice that there is no specific cygwin implementation, though one seems to be available at http://cygwin.com/ml/cygwin/2012-01/msg00032.html
thanks! Pádraig.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 01 Jul 2013 23:46:02 GMT) Full text and rfc822 format available.
Message #22 received at 14752 debbugs.gnu.org (full text, mbox):
Following up on this issue it seems that posix_spawn() cannot be used, at least not trivially. The issue is that posix_spawn decides if it will spawn the new process with fork() of vfork() based on some conditions, one of which is if file_actions is NULL.
http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/posix/spawni.c#l105
For the temp compression to work it is nessesary to pass the file descriptors of the pipe from the parent to the child. I'm not sure how to proceed, I found this relevant thread that proposes to relax the restrictions and use vfork more often:
http://sourceware.org/bugzilla/show_bug.cgi?id=10354
And this thread http://sourceware.org/ml/libc-help/2010-10/msg00001.html of someone having the same problem and proposing two solutions. Solution #1 seems to me that adds a lot of complexity. Solution #2 is hacky, and I'm not aware if there are unwanted sideffects of using the enviroment to transfer the FDs.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Tue, 02 Jul 2013 00:11:02 GMT) Full text and rfc822 format available.
Message #25 received at 14752 debbugs.gnu.org (full text, mbox):
On 07/02/2013 12:45 AM, Petros Aggelatos wrote:
Following up on this issue it seems that posix_spawn() cannot be used, at least not trivially. The issue is that posix_spawn decides if it will spawn the new process with fork() of vfork() based on some conditions, one of which is if file_actions is NULL.
http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/posix/spawni.c#l105
Ugh, that's unfortunate :( That restriction seems a bit too harsh on cursory glance.
For the temp compression to work it is nessesary to pass the file descriptors of the pipe from the parent to the child. I'm not sure how to proceed, I found this relevant thread that proposes to relax the restrictions and use vfork more often:
http://sourceware.org/bugzilla/show_bug.cgi?id=10354
And this thread http://sourceware.org/ml/libc-help/2010-10/msg00001.html of someone having the same problem and proposing two solutions. Solution #1 seems to me that adds a lot of complexity. Solution #2 is hacky, and I'm not aware if there are unwanted sideffects of using the enviroment to transfer the FDs.
Thanks for taking the time to find these previous discussions.
Perhaps the best first approach is to use vfork() directly, so see if we actually do hit limitations in practice. If not, then perhaps we could adjust the gnulib spawni.c code accordingly, and add that as input to the above glibc bug for eventual incorporation into glibc.
cheers, Pádraig.
**bug archived.**Request was from Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
. (Tue, 30 Jul 2013 11:24:04 GMT) Full text and rfc822 format available.
**bug unarchived.**Request was from Pádraig Brady <P <at> draigBrady.com>
to control <at> debbugs.gnu.org
. (Mon, 26 May 2014 20:47:02 GMT) Full text and rfc822 format available.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 26 May 2014 20:49:01 GMT) Full text and rfc822 format available.
Message #32 received at 14752 debbugs.gnu.org (full text, mbox):
On 05/26/2014 09:13 PM, Azat Khuzhin wrote:
sort already have one when it trying to create decompressor, it is obvious why it is really required in this case, since sort will read compressed data as plain otherwise. But sometimes it is really usefull to know whether sort failed to create compressor or not, since some users may rely on available free space and compressor.
- src/sort.c (create_temp_file): Add a warning when creating of compressor failed.
There is some old discussion about this http://osdir.com/ml/bug-coreutils-gnu/2013-07/msg00010.html, but before this will be fixed(?) we could print a warning on fail at least. Thanks.
src/sort.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/sort.c b/src/sort.c index 49caae5..eb1b1f3 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1133,6 +1133,13 @@ maybe_create_temp (FILE **pfp, bool survive_fd_exhaustion)
async_safe_die (errno, "couldn't execute compress program"); }
else
{
error (0, errno,
_("warning: couldn't create process for %s "
"(try to install overcommit always)"),
compress_program);
}
}
*pfp = fdopen (tempfd, "w");
Thanks for the patch.
Note POSIX says that programs shouldn't output to stderr unless they're exiting with a failure code, I guess to avoid gradual accretion of warnings etc. which could impair general usage.
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
thanks, Pádraig.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 26 May 2014 21:01:01 GMT) Full text and rfc822 format available.
Message #35 received at 14752 debbugs.gnu.org (full text, mbox):
On Mon, May 26, 2014 at 09:43:23PM +0100, Pádraig Brady wrote:
On 05/26/2014 09:13 PM, Azat Khuzhin wrote:
sort already have one when it trying to create decompressor, it is obvious why it is really required in this case, since sort will read compressed data as plain otherwise. But sometimes it is really usefull to know whether sort failed to create compressor or not, since some users may rely on available free space and compressor.
- src/sort.c (create_temp_file): Add a warning when creating of compressor failed.
There is some old discussion about this http://osdir.com/ml/bug-coreutils-gnu/2013-07/msg00010.html, but before this will be fixed(?) we could print a warning on fail at least. Thanks.
src/sort.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/sort.c b/src/sort.c index 49caae5..eb1b1f3 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1133,6 +1133,13 @@ maybe_create_temp (FILE **pfp, bool survive_fd_exhaustion)
async_safe_die (errno, "couldn't execute compress program"); }
else
{
error (0, errno,
_("warning: couldn't create process for %s "
"(try to install overcommit always)"),
compress_program);
}
}
*pfp = fdopen (tempfd, "w");
Thanks for the patch.
Note POSIX says that programs shouldn't output to stderr unless they're exiting with a failure code,
Hm, didn't know that, Thanks! (Sometimes POSIX is not so good for human, I suppose.)
I guess to avoid gradual accretion of warnings etc. which could impair general usage.
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
Yes this is the same as I linked above. Does any body have a patch for this, or should I start working on this?
Thanks.
thanks, Pádraig.
-- Respectfully Azat Khuzhin
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Mon, 26 May 2014 21:12:01 GMT) Full text and rfc822 format available.
Message #38 received at 14752 debbugs.gnu.org (full text, mbox):
On 05/26/2014 10:00 PM, Azat Khuzhin wrote:
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
Yes this is the same as I linked above. Does any body have a patch for this, or should I start working on this?
I was waiting for a patch that didn't materialize. I'll have a look myself now.
thanks, Pádraig.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Wed, 28 May 2014 15:17:02 GMT) Full text and rfc822 format available.
Message #41 received at 14752 debbugs.gnu.org (full text, mbox):
On 05/26/2014 10:10 PM, Pádraig Brady wrote:
On 05/26/2014 10:00 PM, Azat Khuzhin wrote:
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
Yes this is the same as I linked above. Does any body have a patch for this, or should I start working on this?
I was waiting for a patch that didn't materialize. I'll have a look myself now.
So I had a look and the change while definitely worth doing is a bit invasive and so probably not appropriate for the impending release, as that's focusing on bug fixes rather than performance characteristics.
Some implementation notes for reference...
vfork() is portable only when one essentially just does an execve() right after the vfork(). Therefore just for fire and forget processes. Anything where you need to interact with the sub process like setting up files to communicate etc. is going to have portability issues. Even using execvp() is problematic I understand. Also sort is multithreaded which might further complicate things.
Leveraging posix_spawn() is promising, as the main reason for that interface is to provide an efficient fork()+exec() mechanism. That can be implemented using vfork() or with clone() on Linux. The implementation in glibc is only a token one however as it just uses fork()+exec() usually. One can override with the POSIX_SPAWN_USEVFORK flag, but there are many non obvious implementation gotchas with doing that. Again being multithreaded may complicate things here. Note the musl, freebsd, osx and solaris posix_spawn() implementations are efficient, which would be another reason to use this assuming the glibc/gnulib implementation is fixed up.
Another option would be for sort(1) to start up a helper child process, before we allocate much memory. Then we could communicate descriptors back and forth to that, and it could deal with forking the children. That would be portable too, but a little involved. Ideally we could keep the complication within posix_spawn() instead.
Note this is a general issue not just related to sort(1). Many servers for example whether written in java/python/ruby or whatever have this issue when they use lots of memory and would like to popen() something. So fixing up the glibc posix_spawn() implementation would be very useful so that popen() etc. within glibc and the various language runtimes could leverage.
Some links for reference:
http://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html https://sourceware.org/ml/libc-help/2010-10/msg00001.html https://sourceware.org/bugzilla/show_bug.cgi?id=10354 https://sourceware.org/bugzilla/show_bug.cgi?id=378 http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c http://ewontfix.com/7/ http://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux http://stackoverflow.com/questions/8152076/spawn-process-from-multithreaded-application https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/ http://code.google.com/p/popen-noshell/source/browse/trunk/popen_noshell.c
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Wed, 28 May 2014 16:24:02 GMT) Full text and rfc822 format available.
Message #44 received at 14752 debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
Anything where you need to interact with the sub process like setting up files to communicate etc. is going to have portability issues. Even using execvp() is problematic I understand.
As long as the child doesn't touch parent memory that the parent needs, it should be OK. There is a memory leak in execvp in old glibc versions, but I expect that isn't something we need to worry about.
Last time I checked, vfork was significantly faster than fork when the parent process has a lot of memory, and was still worth using for its performance advantages.
**bug archived.**Request was from Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
. (Thu, 26 Jun 2014 11:24:04 GMT) Full text and rfc822 format available.
**bug unarchived.**Request was from Pádraig Brady <P <at> draigBrady.com>
to control <at> debbugs.gnu.org
. (Thu, 16 Jul 2015 03:07:02 GMT) Full text and rfc822 format available.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Thu, 16 Jul 2015 03:11:02 GMT) Full text and rfc822 format available.
Message #51 received at 14752 debbugs.gnu.org (full text, mbox):
On 16/07/15 03:00, Azat Khuzhin wrote:
On Wed, May 28, 2014 at 04:15:51PM +0100, Pádraig Brady wrote:
On 05/26/2014 10:10 PM, Pádraig Brady wrote:
On 05/26/2014 10:00 PM, Azat Khuzhin wrote:
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
Yes this is the same as I linked above. Does any body have a patch for this, or should I start working on this?
I was waiting for a patch that didn't materialize. I'll have a look myself now.
So I had a look and the change while definitely worth doing is a bit invasive and so probably not appropriate for the impending release, as that's focusing on bug fixes rather than performance characteristics.
Some implementation notes for reference...
Hi Pádraig,
Thanks for your vision/thoughts about this problem, they are very detailed.
vfork() is portable only when one essentially just does an execve() right after the vfork(). Therefore just for fire and forget processes. Anything where you need to interact with the sub process like setting up files to communicate etc. is going to have portability issues. Even using execvp() is problematic I understand. Also sort is multithreaded which might further complicate things.
Agree.
Leveraging posix_spawn() is promising, as the main reason for that interface is to provide an efficient fork()+exec() mechanism. That can be implemented using vfork() or with clone() on Linux. The implementation in glibc is only a token one however as it just uses fork()+exec() usually. One can override with the POSIX_SPAWN_USEVFORK flag, but there are many non obvious implementation gotchas with doing that. Again being multithreaded may complicate things here. Note the musl, freebsd, osx and solaris posix_spawn() implementations are efficient, which would be another reason to use this assuming the glibc/gnulib implementation is fixed up.
AFAIK posix_spawn() in glibc will be something like fork() if you need to setup files, IOW it seems that sort can't posix_spawn() for compress-prog to fix this problem:
__spawni: ... if ((flags & POSIX_SPAWN_USEVFORK) != 0 /* If no major work is done, allow using vfork. Note that we might perform the path searching. But this would be done by a call to execvp(), too, and such a call must be OK according to POSIX. */ || ((flags & (POSIX_SPAWN_SETSIGMASK | POSIX_SPAWN_SETSIGDEF | POSIX_SPAWN_SETSCHEDPARAM | POSIX_SPAWN_SETSCHEDULER | POSIX_SPAWN_SETPGROUP | POSIX_SPAWN_RESETIDS)) == 0 && file_actions == NULL)) new_pid = __vfork (); else new_pid = __fork ();
While sort must use file_actions, no?
Am I missing something?
Right the glibc implementation currently doesn't benefit here. Though it's no worse, and other platforms benefit. There has been a little movement on the associated glibc bug since: https://sourceware.org/bugzilla/show_bug.cgi?id=10354
Another option would be for sort(1) to start up a helper child process, before we allocate much memory. Then we could communicate descriptors back and forth to that, and it could deal with forking the children. That would be portable too, but a little involved. Ideally we could keep the complication within posix_spawn() instead.
So I guess that this is the only choice, and since there is no activity since 2014, I have WIP/RFC patch that implements this, it passes some basic tests, but not all (I'm working on it) and also needs in cleanup and prettying, as for now maybe somebody have some comments?
You could find patches in attachments and via git: git github.com:azat/coreutils.git sort-compress-prog-fixes-v2
Thanks for picking this up again. It is quite involved as expected.
Note this is a general issue not just related to sort(1). Many servers for example whether written in java/python/ruby or whatever have this issue when they use lots of memory and would like to popen() something. So fixing up the glibc posix_spawn() implementation would be very useful so that popen() etc. within glibc and the various language runtimes could leverage.
With the above general benefits in mind, and the complexity of the sort fork server implementation, I'm inclined to think working on posix_spawn() in glibc is more beneficial, and might actually be as easy. Also sort(1) can be moved to using posix_spawn() independently.
Some links for reference:
http://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html https://sourceware.org/ml/libc-help/2010-10/msg00001.html https://sourceware.org/bugzilla/show_bug.cgi?id=10354 https://sourceware.org/bugzilla/show_bug.cgi?id=378 http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c http://ewontfix.com/7/ http://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux http://stackoverflow.com/questions/8152076/spawn-process-from-multithreaded-application https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/ http://code.google.com/p/popen-noshell/source/browse/trunk/popen_noshell.c
thanks! Pádraig.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Thu, 16 Jul 2015 03:13:01 GMT) Full text and rfc822 format available.
Message #54 received at 14752 debbugs.gnu.org (full text, mbox):
On 16/07/15 03:56, Azat Khuzhin wrote:
On Thu, Jul 16, 2015 at 03:46:55AM +0100, Pádraig Brady wrote:
With the above general benefits in mind, and the complexity of the sort fork server implementation, I'm inclined to think working on posix_spawn() in glibc is more beneficial, and might actually be as easy. Also sort(1) can be moved to using posix_spawn() independently.
Yes it will be more general benefit, but I don't think (and this is how it works) that every custom server will switch to it, but I agree that this shouldn't be a problem for highly used servers. And also after posix_spawn will be fixed sort will require new glibc to fix this issue, and hence some distros (like debian) couldn't ship "sort" with fix for this issue without glibc (but I totally agree with you about how this must be fixed -- in more generic basis).
Note we've some flexibility with gnulib to replace posix_spawn() on platforms where we determine the system version doesn't suffice.
Anyway, let's get back to "sort", any chances that that patches (more cleaner and less buggy of course) could be picked for upstream?
Maybe, though sort(1) is complex enough already, so I'd still prefer at least to look at the more general solution.
And thanks for link to the bug about posix_spawn I will look it more closely.
thanks, Pádraig.
**bug archived.**Request was from Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
. (Thu, 13 Aug 2015 11:24:04 GMT) Full text and rfc822 format available.
**bug unarchived.**Request was from Pádraig Brady <P <at> draigBrady.com>
to control <at> debbugs.gnu.org
. (Wed, 03 Feb 2016 05:16:01 GMT) Full text and rfc822 format available.
Information forwardedto bug-coreutils <at> gnu.org
:bug#14752
; Package coreutils
. (Wed, 03 Feb 2016 05:19:02 GMT) Full text and rfc822 format available.
Message #61 received at 14752 debbugs.gnu.org (full text, mbox):
unarchive 14752
On 28/05/14 08:15, Pádraig Brady wrote:
On 05/26/2014 10:10 PM, Pádraig Brady wrote:
On 05/26/2014 10:00 PM, Azat Khuzhin wrote:
So the issue here is that sort is allocating a large buffer up front thus impacting the fork(). Really sort(1) should be trying to avoid this issue in the first place, and the issue is already logged at: http://bugs.gnu.org/14752
Yes this is the same as I linked above. Does any body have a patch for this, or should I start working on this?
I was waiting for a patch that didn't materialize. I'll have a look myself now.
So I had a look and the change while definitely worth doing is a bit invasive and so probably not appropriate for the impending release, as that's focusing on bug fixes rather than performance characteristics.
Some implementation notes for reference...
vfork() is portable only when one essentially just does an execve() right after the vfork(). Therefore just for fire and forget processes. Anything where you need to interact with the sub process like setting up files to communicate etc. is going to have portability issues. Even using execvp() is problematic I understand. Also sort is multithreaded which might further complicate things.
Leveraging posix_spawn() is promising, as the main reason for that interface is to provide an efficient fork()+exec() mechanism. That can be implemented using vfork() or with clone() on Linux. The implementation in glibc is only a token one however as it just uses fork()+exec() usually. One can override with the POSIX_SPAWN_USEVFORK flag, but there are many non obvious implementation gotchas with doing that. Again being multithreaded may complicate things here. Note the musl, freebsd, osx and solaris posix_spawn() implementations are efficient, which would be another reason to use this assuming the glibc/gnulib implementation is fixed up.
Progress on the glibc posix_spawn() front: https://sourceware.org/ml/libc-alpha/2016-02/msg00016.html
Another option would be for sort(1) to start up a helper child process, before we allocate much memory. Then we could communicate descriptors back and forth to that, and it could deal with forking the children. That would be portable too, but a little involved. Ideally we could keep the complication within posix_spawn() instead.
Note this is a general issue not just related to sort(1). Many servers for example whether written in java/python/ruby or whatever have this issue when they use lots of memory and would like to popen() something. So fixing up the glibc posix_spawn() implementation would be very useful so that popen() etc. within glibc and the various language runtimes could leverage.
Some links for reference:
http://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html https://sourceware.org/ml/libc-help/2010-10/msg00001.html https://sourceware.org/bugzilla/show_bug.cgi?id=10354 https://sourceware.org/bugzilla/show_bug.cgi?id=378 http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c http://ewontfix.com/7/ http://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux http://stackoverflow.com/questions/8152076/spawn-process-from-multithreaded-application https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/ http://code.google.com/p/popen-noshell/source/browse/trunk/popen_noshell.c
**bug archived.**Request was from Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
. (Wed, 02 Mar 2016 12:24:04 GMT) Full text and rfc822 format available.
This bug report was last modified 9 years and 105 days ago.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.