Running TensorFlow (original) (raw)

Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in Running A Container and specify the registry, repository, and tags.

About this task

On a system with GPU support for NGC containers, when you run a container, the following occurs:

The method implemented in your system depends on the DGX OS version that you installed (for DGX systems), the NGC Cloud Image that was provided by a Cloud Service Provider, or the software that you installed to prepare to run NGC containers on TITAN PCs, Quadro PCs, or NVIDIA Virtual GPUs (vGPUs).

Procedure

  1. Issue the command for the applicable release of the container that you want. The following command assumes you want to pull the latest container.
    • For TensorFlow version 2.x

         `

docker pull nvcr.io/nvidia/tensorflow:25.01-tf2-py3
2. Open a command prompt and paste thepull` command.
Ensure that the pull process successfully completes before proceeding to step 3. 3. Run the container image.

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:<xx.xx>-tf-py
`

nvidia-docker run -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:<xx.xx>-tf-py
`
To run TensorFlow, import it as a Python module:

        `  

$ python
>>> import tensorflow as tf

print(tf.version)
1.15.0
`

To pull data and model descriptions from locations outside the container for use by TensorFlow or save results to locations outside the container, mount one or more host directories as Docker® data volumes.
Note:
To share data between GPUs, NVIDIA Collective Communications Library (NCCL) might require shared system memory for IPC and pinned (page-locked) system memory resources, so the operating system’s limits on these resources might need to be increased. Refer to your system’s documentation for more information.
In particular, Docker containers default to limited shared and pinned memory resources. When using NCCL inside a container, we recommed that you increase these resources by issuing the following command:

        `  

--shm-size=1g --ulimit memlock=-1
`

in the command line to:
Similarly, on some Redhat Enterprise Linux (RHEL) systems, Docker limits the number of simultaneous PIDs in the container to 4096, which might be too small, particularly for multi-GPU training tasks. To increase this limit, pass the following option to docker run:--pids-limit=8192