GitHub - apache/arrow-java: Official Java implementation of Apache Arrow (original) (raw)

Arrow Java

Getting Started

The following guides explain the fundamental data structures used in the Java implementation of Apache Arrow.

Generated javadoc documentation is available here.

Building from source

Refer to Building Apache Arrow for documentation of environment setup and build instructions.

Flatbuffers dependency

Arrow uses Google's Flatbuffers to transport metadata. The java version of the library requires the generated flatbuffer classes can only be used with the same version that generated them. Arrow packages a version of the arrow-vector module that shades flatbuffers and arrow-format into a single JAR. Using the classifier "shade-format-flatbuffers" in yourpom.xml will make use of this JAR, you can then exclude/resolve the original dependency to a version of your choosing.

Updating the flatbuffers generated code

  1. Verify that your version of flatc matches the declared dependency:

$ flatc --version flatc version 25.1.24

$ grep "dep.fbs.version" java/pom.xml <dep.fbs.version>25.1.24</dep.fbs.version>

  1. Generate the flatbuffer java files by performing the following:

cd $ARROW_HOME

remove the existing files

rm -rf java/format/src

regenerate from the .fbs files

flatc --java -o java/format/src/main/java format/*.fbs

prepend license header

mvn spotless:apply -pl :arrow-format

Performance Tuning

There are several system/environmental variables that users can configure. These trade off safety (they turn off checking) for speed. Typically they are only used in production settings after the code has been thoroughly tested without using them.

Java Properties

Java Code Style Guide

Arrow Java follows the Google style guide here with the following differences:

Refer to checkstyle.xml for rule specifics.

Test Logging Configuration

When running tests, Arrow Java uses the Logback logger with SLF4J. By default, it uses the logback.xml present in the corresponding module's src/test/resourcesdirectory, which has the default log level set to INFO. Arrow Java can be built with an alternate logback configuration file using the following command run in the project root directory:

mvn -Dlogback.configurationFile=file:

See Logback Configuration for more details.

Integration Tests

Integration tests which require more time or more memory can be run by activating the integration-tests profile. This activates the maven failsafe plugin and any class prefixed with IT will be run during the testing phase. The integration tests currently require a larger amount of memory (>4GB) and time to complete. To activate the profile:

mvn -Pintegration-tests