Packaging and Testing with Crossbow — Apache Arrow v21.0.0.dev26 (original) (raw)
The content of arrow/dev/tasks
directory aims for automating the process of Arrow packaging and integration testing.
Packages:
- C++ and Python conda-forge packages for Linux, macOS and Windows
- Python Wheels for Linux, macOS and Windows
- C++ and GLib Linux packages for multiple distributions
- Java for Gandiva
Integration tests:
- Various docker tests
- Pandas
- Dask
- Turbodbc
- HDFS
- Spark
Architecture#
Executors#
Individual jobs are executed on public CI services, currently:
- Linux: GitHub Actions, Travis CI, Azure Pipelines
- macOS: GitHub Actions, Azure Pipelines
- Windows: GitHub Actions, Azure Pipelines
Queue#
Because of the nature of how the CI services work, the scheduling of jobs happens through an additional git repository, which acts like a job queue for the tasks. Anyone can host a queue
repository (usually named <ghuser>/crossbow
).
A job is a git commit on a particular git branch, containing the required configuration files to run the requested builds (like .travis.yml
,azure-pipelines.yml
, or crossbow.yml
for GitHub Actions ).
Scheduler#
Crossbow handles version generation, task rendering and submission. The tasks are defined in tasks.yml
.
Install#
The following guide depends on GitHub, but theoretically any git server can be used.
If you are not using the ursacomputing/crossbowrepository, you will need to complete the first two steps, otherwise proceed to step 3:
- Create the queue repository
- Enable Travis CI and Azure Pipelines integrations for the newly created queue repository.
- Clone either ursacomputing/crossbow if you are using that, or the newly created repository next to the arrow repository:
By default the scripts looks for acrossbow
clone next to thearrow
directory, but this can configured through command line arguments.
git clone https://github.com//crossbow crossbow
Important note: Crossbow only supports GitHub token based authentication. Although it overwrites the repository urls provided with ssh protocol, it’s advisable to use the HTTPS repository URLs. - Create a Personal Access Token with
repo
andworkflow
permissions (other permissions are not needed) - Locally export the token as an environment variable:
export CROSSBOW_GITHUB_TOKEN=
or pass as an argument to the CLI script--github-token
- Add the previously created GitHub token to Travis CI:
UseCROSSBOW_GITHUB_TOKEN
encrypted environment variable. You can set it at the following URL, whereghuser
is the GitHub username andghrepo
is the GitHub repository name (typicallycrossbow
):https://travis-ci.com/<ghuser>/<ghrepo>/settings
- Confirm the auto cancellation feature is turned off for branch builds. This should be the default setting.
- Install Python (minimum supported version is 3.9):
Miniconda is preferred, see installation instructions: - Install the archery toolset containing crossbow itself:
$ pip install -e "arrow/dev/archery[crossbow]" - Try running it:
$ archery crossbow --help
Usage#
The script does the following:
- Detects the current repository, thus supports forks. The following snippet will build kszucs’s fork instead of the upstream apache/arrow repository.
$ git clone https://github.com/kszucs/arrow
$ git clone https://github.com/kszucs/crossbow
$ cd arrow/dev/tasks
$ archery crossbow submit --help # show the available options
$ archery crossbow submit conda-win conda-linux conda-osx - Gets the HEAD commit of the currently checked out branch and generates the version number based on setuptools_scm. So to build a particular branch check out before running the script:
$ git checkout ARROW-
$ archery crossbow submit --dry-run conda-linux conda-osx
Note that the arrow branch must be pushed beforehand, because the script will clone the selected branch. - Reads and renders the required build configurations with the parameters substituted.
- Create a branch per task, prefixed with the job id. For example, to build conda recipes on linux, it will create a new branch:
crossbow@build-<id>-conda-linux
. - Pushes the modified branches to GitHub which triggers the builds. For authentication it uses GitHub OAuth tokens described in the install section.
Query the build status#
Build id (which has a corresponding branch in the queue repository) is returned by the submit
command.
$ archery crossbow status <build id / branch name>
Download the build artifacts#
$ archery crossbow artifacts <build id / branch name>
Examples#
Submit command accepts a list of task names and/or a list of task-group names to select which tasks to build.
Run multiple builds:
$ archery crossbow submit debian-stretch conda-linux-gcc-py37-r40 Repository: https://github.com/kszucs/arrow@tasks Commit SHA: 810a718836bb3a8cefc053055600bdcc440e6702 Version: 0.9.1.dev48+g810a7188.d20180414 Pushed branches:
- debian-stretch
- conda-linux-gcc-py37-r40
Just render without applying or committing the changes:
$ archery crossbow submit --dry-run task_name
Run only conda
package builds and a Linux one:
$ archery crossbow submit --group conda centos-7
Run wheel
builds:
$ archery crossbow submit --group wheel
There are multiple task groups in the tasks.yml
like docker, integration and cpp-python for running docker based tests.
archery crossbow submit
supports multiple options and arguments, for more see its help page:
$ archery crossbow submit --help