Bug #18911: Process._fork hook point is not called when Process.daemon is used - Ruby (original) (raw)
closed
Process._fork hook point is not called when Process.daemon is used
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin20]
Description
Hello there! I'm working at Datadog on the ddtrace gem, and we need to hook into fork operations to make sure that our products work correctly/automatically even in environments that fork.
As part as #17795 a new Process._fork
method was added to allow libraries and frameworks to easily hook into fork operations. I was investigating its use in ddtrace
and noticed the following gap: the Process.daemon
API internally makes use of fork
, but the new hook point is not called for that API.
Testcase:
puts RUBY_DESCRIPTION
module ForkHook
def _fork(*args)
puts " #{Process.pid} Before fork!"
res = super
puts " #{Process.pid} After fork!"
res
end
end
Process.singleton_class.prepend(ForkHook)
puts "#{Process.pid} Regular fork:"
fork { exit }
Process.wait
puts "#{Process.pid} Process.daemon:"
Process.daemon(nil, true)
puts "#{Process.pid} Finishing!"
Testcase output:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin20]
48136 Regular fork: # <-- original process
48136 Before fork!
48136 After fork! # <-- original process
48137 After fork! # <-- child process
48136 Process.daemon: # <-- original process
48139 Finishing! # <-- forks and pid changes, but the hook isn't called
This was surprising to me since the advantage of this hook point would not not needing to hook into the many other places where fork
can get called from.
Thanks a lot :)
Assignee set to akr (Akira Tanaka)
@akr (Akira Tanaka) suggested special treatment for Process.daemon
, and I followed him when implementing it. So this is by design. However, I forgot the reason why Process._fork
should ignore daemon
. I'll ask him at the next dev meeting.
@ivoanjo Do you have difficulties due to this behavior in terms of ddtrace? Or you were just "surprised"? The motivation is very important to discuss the issue. Process.daemon
stops threads, so I guess you have any difficulties, but I'd like to confirm it before the dev meeting.
@nobu (Nobuyoshi Nakada) said that this is because Process.daemon
does not call fork(2) but daemon(3). We happen to know that daemon(3) calls fork(2) (on some environments), but other unknown C functions calling fork internally cannot be handled. So Process._fork
hooks only events that Ruby calls fork(2) directly.
Thanks @mame (Yusuke Endoh) for the awesomely quick reply :)
@ivoanjo (Ivo Anjo) Do you have difficulties due to this behavior in terms of ddtrace? Or you were just "surprised"? The motivation is very important to discuss the issue. Process.daemon stops threads, so I guess you have any difficulties, but I'd like to confirm it before the dev meeting.
Yeah, it surprised me because it was a situation where there's a fork (albeit indirectly) and threads die, so I needed to do cleanups/restart stuff, but was not covered by the _fork
.
@nobu (Nobuyoshi Nakada) (Nobuyoshi Nakada) said that this is because Process.daemon does not call fork(2) but daemon(3). We happen to know that daemon(3) calls fork(2) (on some environments), but other unknown C functions calling fork internally cannot be handled. So Process._fork hooks only events that Ruby calls fork(2) directly.
I guess this is perhaps more of a discoverability/documentation issue. My understanding was "hooking _fork
was all you needed", but actually daemon needs to be as well. But in practical terms, if I'm hooking Process._fork
, it's not hard to also hook Process.daemon
.
Next to the code implementing _fork
we have the following comment:
/*
...
* This method is not for casual code but for application monitoring
* libraries. You can add custom code before and after fork events
* by overriding this method.
*/
VALUE
rb_proc__fork(VALUE _obj)
...would it be reasonable to add a "Note: Process#daemon is similar to fork, but does not go through this method." or something similar?
...would it be reasonable to add a "Note: Process#daemon is similar to fork, but does not go through this method." or something similar?
Let's go with this! Could you please send a PR?
At the dev meeting, I briefly talked with @akr (Akira Tanaka) about this issue.
A main motivation of Process._fork
discussed in #17795 was to disconnect the database connections before fork. This was to prevent corrupted communication if the parent and child processes accessed the database connection in parallel after fork. Process.daemon
uses fork(2) but does not cause this problem because the parent process exits immediately after fork. Thus, for those who override Process._fork
for the original purpose, it would be more natural for Process._fork
not to be invoked in Process.daemon
. Unfortunately, this design does not fit with your use case to restart a monitoring thread after fork.
ivoanjo (Ivo Anjo) wrote in <#note-4>:
Yeah, it surprised me because it was a situation where there's a fork (albeit indirectly) and threads die, so I needed to do cleanups/restart stuff, but was not covered by the
_fork
.
Just FYI, according to @akr (Akira Tanaka), some OSes supports daemon(3) more directly; daemon(3) does not use "double fork" hack internally on a such OS.
- Status changed from Open to Closed
Note for future readers coming here from the Process._fork
documentation.
If you need to restart threads, you should override both _fork
and daemon
module RestartWatcherThread
def _fork
pid = super
restart_thread if pid == 0
pid
end
def daemon(...)
super.tap{ restart_thread }
end
prepend_features(Process.singleton_class)
end
Like0
Like0Like0Like0Like0Like0Like0Like0Like0
Loading...