Building Arrow Java — Apache Arrow v21.0.0.dev26 (original) (raw)

Contents

System Setup#

Arrow Java uses the Maven build system.

Building requires:

Note

CI will test all supported JDK LTS versions, plus the latest non-LTS version.

Building#

All the instructions below assume that you have cloned the Arrow git repository:

$ git clone https://github.com/apache/arrow.git $ cd arrow $ git submodule update --init --recursive

These are the options available to compile Arrow Java modules with:

Building Java Modules#

To build the default modules, go to the project root and execute:

Maven#

$ cd arrow/java $ export JAVA_HOME= $ java --version $ mvn clean install

Docker compose#

$ cd arrow/java $ export JAVA_HOME= $ java --version $ docker compose run java

Archery#

$ cd arrow/java $ export JAVA_HOME= $ java --version $ archery docker run java

Building JNI Libraries (*.dylib / *.so / *.dll)#

First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).

Note

If you are building on Apple Silicon, be sure to use a JDK version that was compiled for that architecture. See, for example, the Azul JDK.

If you are building on Windows OS, see Developing on Windows.

Maven#

CMake#

Homebrew Bundle complete! 25 Brewfile dependencies now installed.

$ brew uninstall aws-sdk-cpp

(We can't use aws-sdk-cpp installed by Homebrew because it has

an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 )

$ export JAVA_HOME=
$ mkdir -p java-dist cpp-jni
$ cmake \
-S cpp \
-B cpp-jni \
-DARROW_BUILD_SHARED=OFF \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_DEPENDENCY_USE_SHARED=OFF \
-DARROW_FILESYSTEM=ON \
-DARROW_GANDIVA=ON \
-DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \
-DARROW_JSON=ON \
-DARROW_ORC=ON \
-DARROW_PARQUET=ON \
-DARROW_S3=ON \
-DARROW_SUBSTRAIT=ON \
-DARROW_USE_CCACHE=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_UNITY_BUILD=ON
$ cmake --build cpp-jni --target install --config Release
$ cmake \
-S java \
-B java-jni \
-DARROW_JAVA_JNI_ENABLE_C=OFF \
-DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \
-DBUILD_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_PREFIX_PATH=$PWD/java-dist \
-DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \
-DProtobuf_USE_STATIC_LIBS=ON
$ cmake --build java-jni --target install --config Release
$ ls -latr java-dist/lib/
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/

Archery#

$ cd arrow $ archery docker run java-jni-manylinux-2014 $ ls -latr java-dist |__ arrow_cdata_jni/ |__ arrow_dataset_jni/ |__ arrow_orc_jni/ |__ gandiva_jni/

Building Java JNI Modules#

Testing#

By default, Maven uses the same Java version to both build the code and run the tests.

It is also possible to use a different JDK version for the tests. This requires Maven toolchains to be configured beforehand, and then a specific test property needs to be set.

Configuring Maven toolchains#

To be able to use a JDK version for testing, it needs to be registered first in Maven toolchains.xmlconfiguration file usually located under ${HOME}/.m2 with the following snippet added to it:

[...]

jdk 21 temurin path/to/jdk/home

[...]

Testing with a specific JDK#

To run Arrow tests with a specific JDK version, use the arrow.test.jdk-version property.

For example, to run Arrow tests with JDK 17, use the following snippet:

$ cd arrow/java $ mvn -Darrow.test.jdk-version=17 clean verify

IDE Configuration#

IntelliJ#

To start working on Arrow in IntelliJ: build the project once from the command line using mvn clean install. Then open the java/ subdirectory of the Arrow repository, and update the following settings:

You may not need to update all of these settings if you build/test with the IntelliJ Maven integration instead of with IntelliJ directly.

Common Errors#

Installing Nightly Packages#

Warning

These packages are not official releases. Use them at your own risk.

Arrow nightly builds are posted on the mailing list at builds@arrow.apache.org. The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found at GitHub Nightly.

Installing from Apache Nightlies#

  1. Look up the nightly version number for the Arrow libraries used.
    For example, for arrow-memory, visit https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501).
  2. Add Apache Nightlies Repository to the Maven/Gradle project. 9.0.0.dev501 ... arrow-apache-nightlies https://nightlies.apache.org/arrow/java ... org.apache.arrow arrow-vector ${arrow.version} ...

Installing Manually#

  1. Decide nightly packages repository to use, for example: ursacomputing/crossbow
  2. Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty). 8 8 9.0.0.dev501 org.apache.arrow flight-core ${arrow.version}
  3. Download the necessary pom and jar files to a temporary directory:
    $ mkdir nightly-packaging-2022-07-30-0-github-java-jars
    $ cd nightly-packaging-2022-07-30-0-github-java-jars
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom
    $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar
    $ tree
    .
    ├── arrow-flight-9.0.0.dev501.pom
    ├── arrow-format-9.0.0.dev501.jar
    ├── arrow-format-9.0.0.dev501.pom
    ├── arrow-java-root-9.0.0.dev501.pom
    ├── arrow-memory-9.0.0.dev501.pom
    ├── arrow-memory-core-9.0.0.dev501.jar
    ├── arrow-memory-core-9.0.0.dev501.pom
    ├── arrow-memory-netty-9.0.0.dev501.jar
    ├── arrow-memory-netty-9.0.0.dev501.pom
    ├── arrow-vector-9.0.0.dev501.jar
    ├── arrow-vector-9.0.0.dev501.pom
    ├── flight-core-9.0.0.dev501.jar
    └── flight-core-9.0.0.dev501.pom
  4. Install the artifacts to the local Maven repository with mvn install:install-file:
    $ mvn install:install-file -Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-java-root -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=jar
    $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=jar
    $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=jar
    $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=jar
    $ mvn install:install-file -Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-flight -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=pom
    $ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=jar
  5. Validate that the packages were installed:
    $ tree ~/.m2/repository/org/apache/arrow
    .
    ├── arrow-flight
    │   ├── 9.0.0.dev501
    │   │   └── arrow-flight-9.0.0.dev501.pom
    ├── arrow-format
    │   ├── 9.0.0.dev501
    │   │   ├── arrow-format-9.0.0.dev501.jar
    │   │   └── arrow-format-9.0.0.dev501.pom
    ├── arrow-java-root
    │   ├── 9.0.0.dev501
    │   │   └── arrow-java-root-9.0.0.dev501.pom
    ├── arrow-memory
    │   ├── 9.0.0.dev501
    │   │   └── arrow-memory-9.0.0.dev501.pom
    ├── arrow-memory-core
    │   ├── 9.0.0.dev501
    │   │   ├── arrow-memory-core-9.0.0.dev501.jar
    │   │   └── arrow-memory-core-9.0.0.dev501.pom
    ├── arrow-memory-netty
    │   ├── 9.0.0.dev501
    │   │   ├── arrow-memory-netty-9.0.0.dev501.jar
    │   │   └── arrow-memory-netty-9.0.0.dev501.pom
    ├── arrow-vector
    │   ├── 9.0.0.dev501
    │   │   ├── _remote.repositories
    │   │   ├── arrow-vector-9.0.0.dev501.jar
    │   │   └── arrow-vector-9.0.0.dev501.pom
    └── flight-core
    ├── 9.0.0.dev501
    │   ├── flight-core-9.0.0.dev501.jar
    │   └── flight-core-9.0.0.dev501.pom
  6. Compile your project like usual with mvn clean install.

Installing Staging Packages#

Warning

These packages are not official releases. Use them at your own risk.

Arrow staging builds are created when a Release Candidate (RC) is being prepared. This allows users to test the RC in their applications before voting on the release.

Installing from Apache Staging#

  1. Look up the next version number for the Arrow libraries used.
  2. Add Apache Staging Repository to the Maven/Gradle project. 9.0.0 ... arrow-apache-staging https://repository.apache.org/content/repositories/staging ... org.apache.arrow arrow-vector ${arrow.version} ...