Building Arrow Java — Apache Arrow v21.0.0.dev26 (original) (raw)
Contents
- Building Arrow Java
- System Setup
- Building
* Building Java Modules
* Maven
* Docker compose
* Archery
* Building JNI Libraries (*.dylib / *.so / *.dll)
* Maven
* CMake
* Archery
* Building Java JNI Modules - Testing
* Configuring Maven toolchains
* Testing with a specific JDK - IDE Configuration
* IntelliJ - Common Errors
- Installing Nightly Packages
* Installing from Apache Nightlies
* Installing Manually - Installing Staging Packages
* Installing from Apache Staging
System Setup#
Arrow Java uses the Maven build system.
Building requires:
- JDK 11+
- Maven 3+
Note
CI will test all supported JDK LTS versions, plus the latest non-LTS version.
Building#
All the instructions below assume that you have cloned the Arrow git repository:
$ git clone https://github.com/apache/arrow.git $ cd arrow $ git submodule update --init --recursive
These are the options available to compile Arrow Java modules with:
- Maven build tool.
- Docker Compose.
- Archery.
Building Java Modules#
To build the default modules, go to the project root and execute:
Maven#
$ cd arrow/java $ export JAVA_HOME= $ java --version $ mvn clean install
Docker compose#
$ cd arrow/java $ export JAVA_HOME= $ java --version $ docker compose run java
Archery#
$ cd arrow/java $ export JAVA_HOME= $ java --version $ archery docker run java
Building JNI Libraries (*.dylib / *.so / *.dll)#
First, we need to build the C++ shared libraries that the JNI bindings will use. We can build these manually or we can use Archery to build them using a Docker container (This will require installing Docker, Docker Compose, and Archery).
Note
If you are building on Apple Silicon, be sure to use a JDK version that was compiled for that architecture. See, for example, the Azul JDK.
If you are building on Windows OS, see Developing on Windows.
Maven#
- To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow/java
$ export JAVA_HOME=
$ java --version
$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N
$ ls -latr ../java-dist/lib
|__ arrow_cdata_jni/ - To build only the JNI C Data Interface library (Windows):
$ cd arrow/java
$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N
$ dir "../java-dist/bin"
|__ arrow_cdata_jni/ - To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow/java
$ export JAVA_HOME=
$ java --version
$ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N
$ ls -latr java-dist/lib
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/ - To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow/java
$ mvn generate-resources -Pgenerate-libs-jni-windows -N
$ dir "../java-dist/bin"
|__ arrow_dataset_jni/
CMake#
- To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow
$ mkdir -p java-dist java-cdata
$ cmake \
-S java \
-B java-cdata \
-DARROW_JAVA_JNI_ENABLE_C=ON \
-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \
-DBUILD_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=java-dist
$ cmake --build java-cdata --target install --config Release
$ ls -latr java-dist/lib
|__ arrow_cdata_jni/ - To build only the JNI C Data Interface library (Windows):
$ cd arrow
$ mkdir java-dist, java-cdata
$ cmake ^
-S java ^
-B java-cdata ^
-DARROW_JAVA_JNI_ENABLE_C=ON ^
-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^
-DBUILD_TESTING=OFF ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_PREFIX=java-dist
$ cmake --build java-cdata --target install --config Release
$ dir "java-dist/bin"
|__ arrow_cdata_jni/ - To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow
$ brew bundle --file=cpp/Brewfile
Homebrew Bundle complete! 25 Brewfile dependencies now installed.
$ brew uninstall aws-sdk-cpp
(We can't use aws-sdk-cpp installed by Homebrew because it has
an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 )
$ export JAVA_HOME=
$ mkdir -p java-dist cpp-jni
$ cmake \
-S cpp \
-B cpp-jni \
-DARROW_BUILD_SHARED=OFF \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_DEPENDENCY_USE_SHARED=OFF \
-DARROW_FILESYSTEM=ON \
-DARROW_GANDIVA=ON \
-DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \
-DARROW_JSON=ON \
-DARROW_ORC=ON \
-DARROW_PARQUET=ON \
-DARROW_S3=ON \
-DARROW_SUBSTRAIT=ON \
-DARROW_USE_CCACHE=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_UNITY_BUILD=ON
$ cmake --build cpp-jni --target install --config Release
$ cmake \
-S java \
-B java-jni \
-DARROW_JAVA_JNI_ENABLE_C=OFF \
-DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \
-DBUILD_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_PREFIX_PATH=$PWD/java-dist \
-DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \
-DProtobuf_USE_STATIC_LIBS=ON
$ cmake --build java-jni --target install --config Release
$ ls -latr java-dist/lib/
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/
- To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow
$ mkdir java-dist, cpp-jni
$ cmake ^
-S cpp ^
-B cpp-jni ^
-DARROW_BUILD_SHARED=OFF ^
-DARROW_CSV=ON ^
-DARROW_DATASET=ON ^
-DARROW_DEPENDENCY_USE_SHARED=OFF ^
-DARROW_FILESYSTEM=ON ^
-DARROW_GANDIVA=OFF ^
-DARROW_JSON=ON ^
-DARROW_ORC=ON ^
-DARROW_PARQUET=ON ^
-DARROW_S3=ON ^
-DARROW_SUBSTRAIT=ON ^
-DARROW_USE_CCACHE=ON ^
-DARROW_WITH_BROTLI=ON ^
-DARROW_WITH_LZ4=ON ^
-DARROW_WITH_SNAPPY=ON ^
-DARROW_WITH_ZLIB=ON ^
-DARROW_WITH_ZSTD=ON ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_PREFIX=java-dist ^
-DCMAKE_UNITY_BUILD=ON ^
-GNinja
$ cd cpp-jni
$ ninja install
$ cd ../
$ cmake ^
-S java ^
-B java-jni ^
-DARROW_JAVA_JNI_ENABLE_C=OFF ^
-DARROW_JAVA_JNI_ENABLE_DATASET=ON ^
-DARROW_JAVA_JNI_ENABLE_DEFAULT=ON ^
-DARROW_JAVA_JNI_ENABLE_GANDIVA=OFF ^
-DARROW_JAVA_JNI_ENABLE_ORC=ON ^
-DBUILD_TESTING=OFF ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_PREFIX=java-dist ^
-DCMAKE_PREFIX_PATH=$PWD/java-dist
$ cmake --build java-jni --target install --config Release
$ dir "java-dist/bin"
|__ arrow_orc_jni/
|__ arrow_dataset_jni/
Archery#
$ cd arrow $ archery docker run java-jni-manylinux-2014 $ ls -latr java-dist |__ arrow_cdata_jni/ |__ arrow_dataset_jni/ |__ arrow_orc_jni/ |__ gandiva_jni/
Building Java JNI Modules#
- To compile the JNI bindings, use the
arrow-c-data
Maven profile:
$ cd arrow/java
$ mvn -Darrow.c.jni.dist.dir=/java-dist/lib -Parrow-c-data clean install - To compile the JNI bindings for ORC / Gandiva / Dataset, use the
arrow-jni
Maven profile:
$ cd arrow/java
$ mvn \
-Darrow.cpp.build.dir=/java-dist/lib/ \
-Darrow.c.jni.dist.dir=/java-dist/lib/ \
-Parrow-jni clean install
Testing#
By default, Maven uses the same Java version to both build the code and run the tests.
It is also possible to use a different JDK version for the tests. This requires Maven toolchains to be configured beforehand, and then a specific test property needs to be set.
Configuring Maven toolchains#
To be able to use a JDK version for testing, it needs to be registered first in Maven toolchains.xml
configuration file usually located under ${HOME}/.m2
with the following snippet added to it:
[...]
jdk 21 temurin path/to/jdk/home[...]
Testing with a specific JDK#
To run Arrow tests with a specific JDK version, use the arrow.test.jdk-version
property.
For example, to run Arrow tests with JDK 17, use the following snippet:
$ cd arrow/java $ mvn -Darrow.test.jdk-version=17 clean verify
IDE Configuration#
IntelliJ#
To start working on Arrow in IntelliJ: build the project once from the command line using mvn clean install
. Then open the java/
subdirectory of the Arrow repository, and update the following settings:
- In the Files tool window, find the path
vector/target/generated-sources
, right click the directory, and select Mark Directory as > Generated Sources Root. There is no need to mark other generated sources directories, as only thevector
module generates sources. - For JDK 11, due to an IntelliJ bug, you must go into Settings > Build, Execution, Deployment > Compiler > Java Compiler and disable “Use ‘–release’ option for cross-compilation (Java 9 and later)”. Otherwise you will get an error like “package sun.misc does not exist”.
- You may want to disable error-prone entirely if it gives spurious warnings (disable both error-prone profiles in the Maven tool window and “Reload All Maven Projects”).
- If using IntelliJ’s Maven integration to build, you may need to change
<fork>
tofalse
in the pom.xml files due to an IntelliJ bug. - To enable debugging JNI-based modules like
dataset
, activate specific profiles in the Maven tab under “Profiles”. Ensure the profilesarrow-c-data
,arrow-jni
,generate-libs-cdata-all-os
,generate-libs-jni-macos-linux
, andjdk11+
are enabled, so that the IDE can build them and enable debugging.
You may not need to update all of these settings if you build/test with the IntelliJ Maven integration instead of with IntelliJ directly.
Common Errors#
- When working with the JNI code: if the C++ build cannot find dependencies, with errors like these:
Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)
Could NOT find Lz4 (missing: LZ4_LIB)
Could NOT find zstd (missing: ZSTD_LIB)
Specify that the dependencies should be downloaded at build time (more details at Dependency Resolution):
-Dre2_SOURCE=BUNDLED \
-DBoost_SOURCE=BUNDLED \
-Dutf8proc_SOURCE=BUNDLED \
-DSnappy_SOURCE=BUNDLED \
-DORC_SOURCE=BUNDLED \
-DZLIB_SOURCE=BUNDLED
Installing Nightly Packages#
Warning
These packages are not official releases. Use them at your own risk.
Arrow nightly builds are posted on the mailing list at builds@arrow.apache.org. The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found at GitHub Nightly.
Installing from Apache Nightlies#
- Look up the nightly version number for the Arrow libraries used.
For example, forarrow-memory
, visit https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501). - Add Apache Nightlies Repository to the Maven/Gradle project. 9.0.0.dev501 ... arrow-apache-nightlies https://nightlies.apache.org/arrow/java ... org.apache.arrow arrow-vector ${arrow.version} ...
Installing Manually#
- Decide nightly packages repository to use, for example: ursacomputing/crossbow
- Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty). 8 8 9.0.0.dev501 org.apache.arrow flight-core ${arrow.version}
- Download the necessary pom and jar files to a temporary directory:
$ mkdir nightly-packaging-2022-07-30-0-github-java-jars
$ cd nightly-packaging-2022-07-30-0-github-java-jars
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom
$ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar
$ tree
.
├── arrow-flight-9.0.0.dev501.pom
├── arrow-format-9.0.0.dev501.jar
├── arrow-format-9.0.0.dev501.pom
├── arrow-java-root-9.0.0.dev501.pom
├── arrow-memory-9.0.0.dev501.pom
├── arrow-memory-core-9.0.0.dev501.jar
├── arrow-memory-core-9.0.0.dev501.pom
├── arrow-memory-netty-9.0.0.dev501.jar
├── arrow-memory-netty-9.0.0.dev501.pom
├── arrow-vector-9.0.0.dev501.jar
├── arrow-vector-9.0.0.dev501.pom
├── flight-core-9.0.0.dev501.jar
└── flight-core-9.0.0.dev501.pom - Install the artifacts to the local Maven repository with
mvn install:install-file
:
$ mvn install:install-file -Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-java-root -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=jar
$ mvn install:install-file -Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-flight -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=pom
$ mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=jar - Validate that the packages were installed:
$ tree ~/.m2/repository/org/apache/arrow
.
├── arrow-flight
│ ├── 9.0.0.dev501
│ │ └── arrow-flight-9.0.0.dev501.pom
├── arrow-format
│ ├── 9.0.0.dev501
│ │ ├── arrow-format-9.0.0.dev501.jar
│ │ └── arrow-format-9.0.0.dev501.pom
├── arrow-java-root
│ ├── 9.0.0.dev501
│ │ └── arrow-java-root-9.0.0.dev501.pom
├── arrow-memory
│ ├── 9.0.0.dev501
│ │ └── arrow-memory-9.0.0.dev501.pom
├── arrow-memory-core
│ ├── 9.0.0.dev501
│ │ ├── arrow-memory-core-9.0.0.dev501.jar
│ │ └── arrow-memory-core-9.0.0.dev501.pom
├── arrow-memory-netty
│ ├── 9.0.0.dev501
│ │ ├── arrow-memory-netty-9.0.0.dev501.jar
│ │ └── arrow-memory-netty-9.0.0.dev501.pom
├── arrow-vector
│ ├── 9.0.0.dev501
│ │ ├── _remote.repositories
│ │ ├── arrow-vector-9.0.0.dev501.jar
│ │ └── arrow-vector-9.0.0.dev501.pom
└── flight-core
├── 9.0.0.dev501
│ ├── flight-core-9.0.0.dev501.jar
│ └── flight-core-9.0.0.dev501.pom - Compile your project like usual with
mvn clean install
.
Installing Staging Packages#
Warning
These packages are not official releases. Use them at your own risk.
Arrow staging builds are created when a Release Candidate (RC) is being prepared. This allows users to test the RC in their applications before voting on the release.
Installing from Apache Staging#
- Look up the next version number for the Arrow libraries used.
- Add Apache Staging Repository to the Maven/Gradle project. 9.0.0 ... arrow-apache-staging https://repository.apache.org/content/repositories/staging ... org.apache.arrow arrow-vector ${arrow.version} ...