Running on Cloud TPUs (original) (raw)
tensor2tensor
Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips specialized for ML training. See the official tutorials for running the T2T Transformer for text on Cloud TPUs andTransformer for Speech Recognition.
Other models on TPU
Many of Tensor2Tensor’s models work on TPU.
You can provision a VM and TPU with ctpu up
. Use the t2t-trainer
command on the VM as usual with the additional flags --use_tpu
and--cloud_tpu_name=$TPU_NAME
.
Note that because the TPUEstimator
does not catch the OutOfRangeError
during evaluation, you should ensure that --eval_steps
is small enough to not exhaust the evaluation data.
A non-exhaustive list of T2T models that work on TPU:
- Image generation:
imagetransformer
withimagetransformer_base_tpu
(orimagetransformer_tiny_tpu
) - Super-resolution:
img2img_transformer
withimg2img_transformer_base_tpu
(orimg2img_transformer_tiny_tpu
) resnet
withresnet_50
(orresnet_18
orresnet_34
)revnet
withrevnet_104
(orrevnet_38_cifar
)shake_shake
withshakeshake_tpu
(orshakeshake_small
)
Example invocation
Use ctpu up
to bring up the VM and TPU machines; once the machines are ready it will SSH you into the VM and you can run the following:
# DATA_DIR and OUT_DIR should be GCS buckets
# TPU_NAME should have been set automatically by the ctpu tool
t2t-trainer \
--model=shake_shake \
--hparams_set=shakeshake_tpu \
--problem=image_cifar10 \
--train_steps=180000 \
--eval_steps=9 \
--local_eval_frequency=100 \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--use_tpu \
--cloud_tpu_name=$TPU_NAME