Run on Clusters — mars 0.10.0+21.g0a42ba8 documentation (original) (raw)
Basic Steps#
Mars can be deployed on a cluster. First, you need to run
on every node in the cluster. This will install dependencies needed for distributed execution on your cluster. After that, you may select a node as supervisor which also integrated web service, leaving other nodes as workers.
The supervisor can be started with the following command:
mars-supervisor -H -p -w
Web service will be started as well.
Workers can be started with the following command:
mars-worker -H -p -s :
After all Mars processes are started, you can open a Python console and run
import mars import mars.tensor as mt import mars.dataframe as md
create a default session that connects to the cluster
mars.new_session('http://:') a = mt.random.rand(2000, 2000, chunk_size=200) b = mt.inner(a, a) b.execute() # submit tensor to cluster df = md.DataFrame(a).sum() df.execute() # submit DataFrame to cluster
You can open a web browser and type http://<web_ip>:<web_port>
to open Mars UI to look up resource usage of workers and execution progress of the task submitted just now.
Using Command Lines#
When running Mars with command line, you can specify arguments to control the behavior of Mars processes. All Mars services have common arguments listed below.
Argument | Description |
---|---|
-H | Service IP binding, 0.0.0.0 by default |
-p | Port of the service. If absent, a randomized port will be used |
-f | Path to service configuration file. Absent when use default configuration. |
-s | List of supervisor endpoints, separated by commas. Useful for workers and webs to spot supervisors, or when you want to run more than one supervisor |
--log-level | Log level, can be debug, info, warning, error |
--log-format | Log format, can be Python logging format |
--log-conf | Python logging configuration file, logging.conf by default |
--use-uvloop | Whether to use uvloop to accelerate, auto by default |
Extra arguments for supervisors are listed below.
Argument | Description |
---|---|
-w | Port of web service in supervisor |
Extra arguments for workers are listed below. Details about memory tuning can be found at the next section.
For instance, if you want to start a Mars cluster with two supervisors and two workers, you can run commands below (memory and CPU tunings are omitted):
On Supervisor 1 (192.168.1.10):
mars-supervisor -H 192.168.1.10 -p 7001 -w 7005 -s 192.168.1.10:7001,192.168.1.11:7002
On Supervisor 2 (192.168.1.11):
mars-supervisor -H 192.168.1.11 -p 7002 -s 192.168.1.10:7001,192.168.1.11:7002
On Worker 1 (192.168.1.20):
mars-worker -H 192.168.1.20 -p 7003 -s 192.168.1.10:7001,192.168.1.11:7002
On Worker 2 (192.168.1.21):
mars-worker -H 192.168.1.21 -p 7004 -s 192.168.1.10:7001,192.168.1.11:7002