Batch service workflow and resources - Azure Batch (original) (raw)

In this overview of the core components of the Azure Batch service, we discuss the high-level workflow that Batch developers can use to build large-scale parallel compute solutions, along with the primary service resources that are used.

Whether you're developing a distributed computational application or service that issues direct REST API calls or you're using another one of the Batch SDKs, you'll use many of the resources and features discussed here.

Basic workflow

The following high-level workflow is typical of nearly all applications and services that use the Batch service for processing parallel workloads:

  1. Upload the data files that you want to process to an Azure Storage account. Batch includes built-in support for accessing Azure Blob storage, and your tasks can download these files to compute nodes when the tasks are run.
  2. Upload the application files that your tasks will run. These files can be binaries or scripts and their dependencies, and are executed by the tasks in your jobs. Your tasks can download these files from your Storage account, or you can use the application packages feature of Batch for application management and deployment.
  3. Create a pool of compute nodes. When you create a pool, you specify the number of compute nodes for the pool, their size, and the operating system. When each task in your job runs, it's assigned to execute on one of the nodes in your pool.
  4. Create a job. A job manages a collection of tasks. You associate each job to a specific pool where that job's tasks will run.
  5. Add tasks to the job. Each task runs the application or script that you uploaded to process the data files it downloads from your Storage account. As each task completes, it can upload its output to Azure Storage.
  6. Monitor job progress and retrieve the task output from Azure Storage.

Note

You need a Batch account to use the Batch service. Most Batch solutions also use an associated Azure Storage account for file storage and retrieval.

Batch service resources

The following topics discuss the resources of Batch that enable your distributed computational scenarios.

Next steps