Sample runtimeerror when cores>1 (original) (raw)
September 8, 2020, 3:28am 1
I’m new to pymc3, just started looking to use it over the past week. I have been looking at a couple quick examples, but ran into runtimeerror when running sample. I only receive the error when I set cores >1, so this looks like a threading issue. Not sure if I have something installed wrong or if this is a true bug
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import warnings
from pymc3 import Model, Normal, Uniform
from pymc3 import sample
warnings.filterwarnings("ignore")
radon = pd.read_csv('C:/Users\Craig/source/repos/pymc_test/radon.csv', index_col=0)
radon_hennepin = radon.query('county=="HENNEPIN"').log_radon
print(radon_hennepin.head())
with Model() as radon_model:
mu = Normal('mu',mu=0, sd=10)
sigma = Uniform('sigma', 0, 10)
with radon_model:
dist = Normal('dist', mu=mu, sigma=sigma, observed=radon_hennepin)
with radon_model:
samples = sample(1000, tune=1000, cores=4, random_seed=42)
When cores is set to 1 the code executes fine. However if the cores is set to anything greater than 1, like in the example above I receive the following error:
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, mu]
Traceback (most recent call last):
File "<ipython-input-5-afe32d11c49f>", line 1, in <module>
runfile('C:/Users/Craig/source/repos/pymc_test/pymc3_test_spyder.py', wdir='C:/Users/Craig/source/repos/pymc_test')
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Craig/source/repos/pymc_test/pymc3_test_spyder.py", line 22, in <module>
samples = sample(1000, tune=1000, cores=4, random_seed=42)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\pymc3\sampling.py", line 437, in sample
trace = _mp_sample(**sample_args)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\pymc3\sampling.py", line 965, in _mp_sample
chain, progressbar)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\pymc3\parallel_sampling.py", line 361, in __init__
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\pymc3\parallel_sampling.py", line 361, in <listcomp>
for chain, seed, start in zip(range(chains), seeds, start_points)
File "C:\Users\Craig\Anaconda3\envs\envTF1\lib\site-packages\pymc3\parallel_sampling.py", line 251, in __init__
raise exc
RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child's exception and traceback appears to be lost.
I’ve run the following code in both MS Visual Studio as well as Spyder. I have the following installed in the environment:
Python == 3.6
Pandas == 0.25.0
Numpy == 1.15.4
pymc3 == 3.7
Hi Craig,
Are you using Windows? In which case I think this is expected: multi-threading can’t be used on this OS, if I’m not mistaken (cc @aseyboldt)
Some people have commented that WSL on Windows has worked well, if you’d like to give that a try.
aseyboldt September 9, 2020, 1:11pm 4
We had some multiprocessing issues on windows recently, but they should be fixed or at least give much better error messages now. Can you update pymc3 and try again?
aseyboldt September 9, 2020, 1:12pm 5
@AlexAndorra yes it can
The model needs to be pickleable on windows and mac, but not on linux, but if it is then everything should work fine.
Yes, I am running this on windows. I tried to update pymc3 to version 3.8, but ran into a bunch of conflict in the environment. I’ll create a new environment and update to 3.8 and try again.
Thanks for the quick reply
Daaaaamn, I can’t get the hang of these OS issues! Thanks for always picking me up @aseyboldt
@cevans3098, the latest version is 3.9.3, not 3.8. If you’re on conda, use conda-forge channel, as the main channel didn’t update to the latest yet: conda install pymc3 -c conda-forge
@AlexAndorra I upgraded to version 3.9.3 bu still unable to run more that 1 core. I get the following trace
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, mu]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "F:\anaconda3\envs\envTensorFlow\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "F:\anaconda3\envs\envTensorFlow\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "F:\anaconda3\envs\envTensorFlow\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Users\Craig\source\repos\pymc_test\pymc_test\pymc_test.py", line 22, in <module>
samples = sample(1000, tune=1000, cores=2, random_seed=42)
File "F:\anaconda3\envs\envTensorFlow\lib\site-packages\pymc3\sampling.py", line 545, in sample
trace = _mp_sample(**sample_args, **parallel_args)
File "F:\anaconda3\envs\envTensorFlow\lib\site-packages\pymc3\sampling.py", line 1481, in _mp_sample
pickle_backend=pickle_backend,
File "F:\anaconda3\envs\envTensorFlow\lib\site-packages\pymc3\parallel_sampling.py", line 454, in __init__
for chain, seed, start in zip(range(chains), seeds, start_points)
File "F:\anaconda3\envs\envTensorFlow\lib\site-packages\pymc3\parallel_sampling.py", line 454, in <listcomp>
for chain, seed, start in zip(range(chains), seeds, start_points)
File "F:\anaconda3\envs\envTensorFlow\lib\site-packages\pymc3\parallel_sampling.py", line 294, in __init__
self._process.start()
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "F:\anaconda3\envs\envTensorFlow\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Can you try the new mp_ctx
kwarg: pm.sample(mp_ctx="forkserver")
or pm.sample(mp_ctx="spawn")
?
aseyboldt September 11, 2020, 8:15am 10
If this is just a normal python script (and not a notebook), you need to protect your final code using the standard if __name__ == '__main__'
. We are using the multiprocessing module from the std lib, and this will reimport (“execute”) your whole script again, but with __name__
set to a different value. If that worker then also executes the pm.sample()
function, it would again start worker processes, which would exectue the script…
So your script should look something like this:
import pymc3 as pm
def main():
# load_data
radon = pd.read_csv('C:/Users\Craig/source/repos/pymc_test/radon.csv', index_col=0)
radon_hennepin = radon.query('county=="HENNEPIN"').log_radon
print(radon_hennepin.head())
# build and sample model
with Model() as radon_model:
mu = Normal('mu',mu=0, sd=10)
sigma = Uniform('sigma', 0, 10)
with radon_model:
dist = Normal('dist', mu=mu, sigma=sigma, observed=radon_hennepin)
with radon_model:
samples = sample(1000, tune=1000, cores=4, random_seed=42)
if __name__ == '__main__':
main()
PS: It is best practice to use import pymc3 as pm
instead of importing individual names.
@aseyboldt that fixed the problem… lesson learned for me not to take a short cut. Once I added the if name == ‘main’ everything worked.
Thanks again!
I am also getting this error. Unfortunately, I cannot always put my code within an if __name__ == '__main__'
.
Here’s what I found on Windows.
I tried pm.sample(mp_ctx="forkserver")
but I got ValueError: cannot find context for 'forkserver'
.
The other option, pm.sample(mp_ctx="spawn")
didn’t work either. I got the freeze error on that one.
On any of my Linux systems I find that this is entirely a non-issue and don’t have to think about it. Unfortunately I am required to use Windows sometimes.