Configure applications to use a specific Java Virtual Machine (original) (raw)

Amazon EMR releases have different default Java Virtual Machine (JVM) versions. This page explains the JVM support for different releases and applications.

Considerations

For information about the supported Java versions for applications, see the application pages in the Amazon EMR Release Guide.

Amazon EMR only supports running one runtime version in a cluster, and doesn't support running different nodes or applications on different runtime versions on the same cluster.
For Amazon EMR 7.x, the default Java Virtual Machine (JVM) is Java 17 for applications that support Java 17, with the exception of Apache Livy. For more information about the supported JDK versions for applications, see the corresponding release page in the Amazon EMR Release Guide.
Starting with Amazon EMR 7.1.0, Flink supports and is set to Java 17 by default. To use a different version Java runtime, override the settings inflink-conf. For more information about configuring Flink to use Java 8 or Java 11, seeConfigure Flink to run with Java 11.
For Amazon EMR 5.x and 6.x, the default Java Virtual Machine (JVM) is Java 8.
- For Amazon EMR releases 6.12.0 and higher, some applications also support Java 11 and 17.
- For Amazon EMR releases 6.9.0 and higher, Trino supports Java 17 as default. For more information about Java 17 with Trino, see Trino updates to Java 17 on the Trino blog.

Keep in mind the following application-specific considerations when you choose your runtime version:

Application-specific Java configuration notes

Application	Java configuration notes
Spark	To run Spark with a non-default Java version, you must configure both Spark and Hadoop. For examples, see Override the JVM. Configure JAVA_HOME inspark-env to update the Java runtime of primary instance processes. For example, spark-submit, spark-shell, and Spark History Server. Modify the Hadoop configuration to update the Java runtime of the Spark executors and the YARNApplicationMaster
Spark RAPIDS	You can run RAPIDS with the configured Java version for Spark.
Iceberg	You can run Iceberg with the configured Java version of the application that is using it.
Delta	You can run Delta with the configured Java version of the application that is using it.
Hudi	You can run Hudi with the configured Java version of the application that is using it.
Hadoop	To update the JVM for Hadoop, modifyhadoop-env. For examples, see Override the JVM.
Hive	To set the Java version to 11 or 17 for Hive, configure the Hadoop JVM setting to the Java version that you want to use.
HBase	To update the JVM for HBase, modify hbase-env. By default, Amazon EMR sets the HBase JVM based on the JVM configuration for Hadoop unless you override the settings inhbase-env. For examples, see Override the JVM.
Flink	To update the JVM for Flink, modifyflink-conf. By default, Amazon EMR sets the Flink JVM based on the JVM configuration for Hadoop unless you override the settings in flink-conf. For more information, see Configure Flink to run with Java 11.
Oozie	To configure Oozie to run on Java 11 or 17, configure Oozie Server, the Oozie LauncherAM Launcher AM, and change your client-side executable and job configurations. You can also configure EmbeddedOozieServer to run on Java 17. For more information, see Configure Java version for Oozie.
Pig	Pig only supports Java 8. You can't use Java 11 or 17 with Hadoop and run Pig on the same cluster.

Override the JVM

To override the JVM setting for an Amazon EMR release - for example, to use Java 17 with a cluster that uses Amazon EMR release 6.12.0 - supply the JAVA_HOME setting to its environment classification, which is`application`-env for all applications except Flink. For Flink, the environment classification is flink-conf. For steps to configure the Java runtime with Flink, see Configure Flink to run with Java 11.

Topics

Override the JVM setting with Apache Spark
Override the JVM setting with Apache HBase
Override the JVM setting with Apache Hadoop and Hive

Override the JVM setting with Apache Spark

When you use Spark with Amazon EMR releases 6.12 and higher, you can set the environment so that the executors use Java 11 or 17. And when you use Spark with Amazon EMR releases lower than 5.x, and you write a driver for submission in cluster mode, the driver uses Java 7. However, you can set the environment to ensure that the executors use Java 8.

To override the JVM for Spark, you should set both the Hadoop and Spark classifications.

{
"Classification": "hadoop-env", 
        "Configurations": [
            {
"Classification": "export", 
                "Configurations": [], 
                "Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
                }
            }
        ], 
        "Properties": {}
    }, 
    {
"Classification": "spark-env", 
        "Configurations": [
            {
"Classification": "export", 
                "Configurations": [], 
                "Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
                }
            }
        ], 
        "Properties": {}
    }

The following example shows how to add required configuration parameters for EMR 7.0.0+ to ensure consistent Java version usage across all components.

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.executorEnv.JAVA_HOME": "/usr/lib/jvm/java-1.8.0",
      "spark.yarn.appMasterEnv.JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
    }
  },
  {
    "Classification": "hadoop-env",
    "Configurations": [
      {
        "Classification": "export",
        "Configurations": [],
        "Properties": {
          "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
        }
      }
    ],
    "Properties": {}
  },
  {
    "Classification": "spark-env",
    "Configurations": [
      {
        "Classification": "export",
        "Configurations": [],
        "Properties": {
          "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
        }
      }
    ],
    "Properties": {}
  }
]

Override the JVM setting with Apache HBase

To configure HBase to use Java 11, you can set the following configuration when you launch the cluster.


[
  {
    "Classification": "hbase-env", 
    "Configurations": [
      {
        "Classification": "export", 
        "Configurations": [], 
        "Properties": {
          "JAVA_HOME": "/usr/lib/jvm/jre-11"
        }
      }
    ], 
    "Properties": {}
  }
]

Override the JVM setting with Apache Hadoop and Hive

The following example shows how to set the JVM to version 17 for Hadoop and Hive.

[
    {
        "Classification": "hadoop-env", 
            "Configurations": [
                {
                    "Classification": "export", 
                    "Configurations": [], 
                    "Properties": {
                        "JAVA_HOME": "/usr/lib/jvm/jre-17"
                    }
                }
        ], 
        "Properties": {}
    }
]

Service ports

The following are YARN and HDFS service ports. These settings reflect Hadoop defaults. Other application services are hosted at default ports unless otherwise documented. For more information, see the application's project documentation.

Port settings for YARN and HDFS

Setting	Hostname/Port
fs.default.name	default (hdfs://emrDeterminedIP:8020)
dfs.datanode.address	default (0.0.0.0:50010)
dfs.datanode.http.address	default (0.0.0.0:50075)
dfs.datanode.https.address	default (0.0.0.0:50475)
dfs.datanode.ipc.address	default (0.0.0.0:50020)
dfs.http.address	default (0.0.0.0:50070)
dfs.https.address	default (0.0.0.0:50470)
dfs.secondary.http.address	default (0.0.0.0:50090)
yarn.nodemanager.address	default (${yarn.nodemanager.hostname}:0)
yarn.nodemanager.localizer.address	default (${yarn.nodemanager.hostname}:8040)
yarn.nodemanager.webapp.address	default (${yarn.nodemanager.hostname}:8042)
yarn.resourcemanager.address	default (${yarn.resourcemanager.hostname}:8032)
yarn.resourcemanager.admin.address	default (${yarn.resourcemanager.hostname}:8033)
yarn.resourcemanager.resource-tracker.address	default (${yarn.resourcemanager.hostname}:8031)
yarn.resourcemanager.scheduler.address	default (${yarn.resourcemanager.hostname}:8030)
yarn.resourcemanager.webapp.address	default (${yarn.resourcemanager.hostname}:8088)
yarn.web-proxy.address	default (no-value)
yarn.resourcemanager.hostname	emrDeterminedIP

Note

The term emrDeterminedIP is an IP address that is generated by the Amazon EMR control plane. In the newer version, this convention has been removed, except for the yarn.resourcemanager.hostname andfs.default.name settings.

Application users

Applications run processes as their own user. For example, Hive JVMs run as userhive, MapReduce JVMs run as mapred, and so on. This is demonstrated in the following process status example.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
hive      6452  0.2  0.7 853684 218520 ?       Sl   16:32   0:13 /usr/lib/jvm/java-openjdk/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=/usr/lib/hadoop
hive      6557  0.2  0.6 849508 202396 ?       Sl   16:32   0:09 /usr/lib/jvm/java-openjdk/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server2.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=/usr/lib/hadoop/l
hbase     6716  0.1  1.0 1755516 336600 ?      Sl   Jun21   2:20 /usr/lib/jvm/java-openjdk/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -Xmx1024m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/
hbase     6871  0.0  0.7 1672196 237648 ?      Sl   Jun21   0:46 /usr/lib/jvm/java-openjdk/bin/java -Dproc_thrift -XX:OnOutOfMemoryError=kill -9 %p -Xmx1024m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/
hdfs      7491  0.4  1.0 1719476 309820 ?      Sl   16:32   0:22 /usr/lib/jvm/java-openjdk/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-ip-10-71-203-213.log -Dhadoo
yarn      8524  0.1  0.6 1626164 211300 ?      Sl   16:33   0:05 /usr/lib/jvm/java-openjdk/bin/java -Dproc_proxyserver -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-
yarn      8646  1.0  1.2 1876916 385308 ?      Sl   16:33   0:46 /usr/lib/jvm/java-openjdk/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-y
mapred    9265  0.2  0.8 1666628 260484 ?      Sl   16:33   0:12 /usr/lib/jvm/java-openjdk/bin/java -Dproc_historyserver -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop