Troubleshoot Common Problems - MATLAB & Simulink (original) (raw)

This section offers advice on solving problems you might encounter with MATLAB® Parallel Server™ software.

License Errors

When starting a MATLAB worker, a licensing problem might result in the message

License checkout failed. No such FEATURE exists. License Manager Error -5

There are many reasons why you might receive this error:

Memory Errors on UNIX Operating Systems

If the number of processes created by the server services on a machine running a Linux® operating system exceeds the operating system limits, the services fail and generate an out-of-memory error. It is recommended that you adjust your system limits. For more information, see Recommended System Limits for Macintosh and Linux (Parallel Computing Toolbox).

Run Server Processes on Windows Network Installation

Many networks are configured not to allow LocalSystem to have access to UNC or mapped network shares. In this case, run the mjs process under a different user with rights to log on as a service. See Set MATLAB Job Scheduler Service User.

Required Ports

For MATLAB Job Scheduler

BASE_PORT. The mjs_def file specifies and describes the ports required by the job manager and all workers. For more information, see the description for theBASE_PORT parameter in Define MATLAB Job Scheduler Startup Parameters.

Communicating Jobs. On worker machines running a UNIX® operating system, the number of ports required by MPICH for the running of communicating jobs ranges from BASE_PORT + 1000 to BASE_PORT + 2000.

With Third-Party Scheduler

Communication Between Workers. Before the worker processes start, you can control the port range the workers use for communicating jobs. Define the minimum port number using theFI_TCP_PORT_LOW_RANGE environment variable and the maximum port number using the FI_TCP_PORT_HIGH_RANGE environment variable.

Before R2024b: You can control the range of ports used by the workers for communicating jobs by defining the environment variableMPICHPORTRANGE with the value minport:maxport.

Open Ports on Workers for Inbound Communication from Client. You can control the listening port range workers open to connect to clients for interactive parallel pool jobs.

Client Ports

With the pctconfig (Parallel Computing Toolbox) function, you specify the ports used by the client. If the default ports cannot be used, this function allows you to configure ports separately for communication with the job scheduler and communication with a parallel pool.

Ephemeral TCP Ports with MATLAB Job Scheduler

If you use MATLAB Job Scheduler on a cluster of nodes running Windows® operating systems, you must make sure that a large number of ephemeral TCP ports are available on the machine that hosts the job manager. By default, the maximum valid ephemeral TCP port number on a Windows operating system is 5000, but transfers of large data sets might fail if this setting is not increased. In particular, if your cluster has 32 or more workers, you should increase the maximum valid ephemeral TCP port number using the following procedure:

  1. Start the Registry Editor.
  2. Locate the following subkey in the registry, and click :
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  3. On the Registry Editor window, select > > .
  4. In the list of entries on the right, change the new value name toMaxUserPort and press Enter.
  5. Right-click on the MaxUserPort entry name and select .
  6. In the Edit DWORD Value dialog, enter 65534 in the Value data field. Under Base select Decimal. Click OK.
    This parameter controls the maximum port number that is used when a program requests any available user port from the system. Typically, ephemeral (short-lived) ports are allocated between the values of 1024 and 5000 inclusive. This action allows allocation for port numbers up to 65534.
  7. Quit the Registry Editor.
  8. Reboot your machine.

Host Communications Problems

If a worker is not able to make a connection with its MATLAB Job Scheduler, or if a client session cannot validate a profile that uses that scheduler, this might indicate communications problems between nodes.

With Command-Line Interface

First, be sure that the machines in question agree on their IP resolutions. The IP address for a particular host should be the same for itself as it is from the perspective of another host. For example, if a process on hostB cannot connect to one onhostA, find out the hostA IP address for itself, then see what the IP address for hostA is from hostB. They should be the same.

If the machines can identify each other, the nodestatus command can be useful for diagnosing problems between their processes. Use the function to determine what MATLAB Parallel Server processes are running on the local host, and which are accessible from remote hosts. If a worker on hostA cannot register with its job manager onhostB, run nodestatus on both hosts to see what each can see on hostB.

On hostB, execute:

nodestatus -remotehost hostB

Then on hostA, run exactly the same command:

nodestatus -remotehost hostB

The results should be the same, showing the same listing of job managers and workers.

If the output indicates problems, run the command again with a higher information level to receive more detailed information:

nodestatus -remotehost hostB -infolevel 3

With Admin Center GUI

You can diagnose some communications problems using Admin Center.

If you cannot successfully add hosts to the listing by specifying host name, you can use their IP addresses instead For more information, see Add Hosts. If you suspect any communications problems, in the Admin Center GUI click Test Connectivity. For more information, see Test MATLAB Job Scheduler Cluster Connectivity in Admin Center. This testing verifies that the nodes can identify each other and allow their processes to communicate with each other.

Verify Network Communications for Cluster Discovery

If you want to use the discover cluster capabilities in Parallel Computing Toolbox, your network must be configured to use DNS SRV or DNS TXT records.

DNS SRV Record

When you use DNS for MATLAB Job Scheduler cluster discovery, you require a DNS SRV record for each domain. You can have multiple DNS SRV records for multiple MATLAB Job Schedulers. Use the following general form for each DNS SRV record.

_mdcs._tcp. IN SRV .

Construct a DNS SRV record for a MATLAB Job Scheduler server using the following parts.

A valid DNS SRV record for the company.com network running a MATLAB Job Scheduler on machine mjs-1 might look like this:

_mdcs._tcp.company.com 3600 IN SRV 0 0 27350 mjs-1.company.com.

Note

If multiple domains are required to locate the cluster, use a DNS SRV record for each domain. If the network accessed by users via VPN has different DNS SRV records to your internal network, ensure that a DNS SRV record exists for each domain.

Use the standard procedure for your DNS system to create appropriate DNS SRV records. You can use standard utilities such as the nslookup command to verify that your network is configured with the necessary DNS SRV records. To examine MATLAB Job Scheduler DNS SRV records for the company.com domain, use the following command.

nslookup -type=SRV _mdcs._tcp.company.com

DNS TXT Record

Use DNS TXT records for third-party scheduler cluster discovery. A DNS TXT record associates a text string with a particular domain. To let MATLAB know where to find cluster discovery configuration files, store the locations of cluster discovery configuration files as text strings in DNS TXT records.

You can have multiple DNS TXT records for multiple clusters. Use this general form for each DNS TXT record.

_mdcs._tcp. IN TXT "discover_folder="

Construct a DNS TXT record to discover a third-party scheduler using these parts.

A valid DNS TXT record for the company.com network running a Slurm scheduler cluster with a cluster discovery configuration file stored in/network/share/discovery might look like this:

_mdcs._tcp.company.com IN TXT "discover_folder=/network/share/discovery"

Note

If multiple domains are required to locate the cluster, use a DNS TXT record for each domain. If the network accessed by users via VPN has different DNS TXT records to your internal network, ensure that a DNS TXT record exists for each domain.

Use the standard procedure for your DNS system to create appropriate DNS TXT records. You can use standard utilities such as the nslookup command to verify that your network is configured with the necessary DNS TXT records. To examine DNS TXT records for thecompany.com domain, use the following command.

nslookup -type=TXT _mdcs._tcp.company.com

See Also

Topics