GitHub - pytorch/torcharrow: High performance model preprocessing library on PyTorch (original) (raw)

TorchArrow: a data processing library for PyTorch

This library currently does not have a stable release. The API and implementation may change. Future changes may not be backward compatible.

TorchArrow is a torch.Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:

Installation

You will need Python 3.7 or later. Also, we highly recommend installing an Miniconda environment.

First, set up an environment. If you are using conda, create a conda environment:

conda create --name torcharrow python=3.7
conda activate torcharrow

Version Compatibility

The following is the corresponding torcharrow versions and supported Python versions.

torch torcharrow python
main / nightly main / nightly >=3.7, <=3.10
1.13.0 0.2.0 >=3.7, <=3.10

Colab

Follow the instructions in this Colab notebook

Nightly Binaries

Experimental nightly binary on macOS (requires macOS SDK >= 10.15) and Linux (requires glibc >= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:

pip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

From Source

If you are installing from source, you will need Python 3.7 or later and a C++17 compiler.

Get the TorchArrow Source

git clone --recursive https://github.com/pytorch/torcharrow cd torcharrow

if you are updating an existing checkout

git submodule sync --recursive git submodule update --init --recursive

Install Dependencies

On macOS

HomeBrew is required to install development tools on macOS.

Install dependencies from Brew

brew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent

Build and install other dependencies

scripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2

On Ubuntu (20.04 or later)

Install dependencies from APT

apt install -y g++ cmake ccache ninja-build checkinstall
libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev
libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev

Build and install folly and fmt

scripts/setup-ubuntu.sh

Install TorchArrow

For local development, you can build with debug mode:

DEBUG=1 python setup.py develop

And run unit tests with

To build and install TorchArrow with release mode:

License

TorchArrow is BSD licensed, as found in the LICENSE file.