The Vector API in Java | Baeldung (original) (raw)
Baeldung Pro – NPI EA (cat = Baeldung)
Baeldung Pro comes with both absolutely No-Ads as well as finally with Dark Mode, for a clean learning experience:
Once the early-adopter seats are all used, the price will go up and stay at $33/year.
Partner – Microsoft – NPI EA (cat = Baeldung)
Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.
Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.
To learn more about Java features on Azure Container Apps, visit the documentation page.
You can also ask questions and leave feedback on the Azure Container Apps GitHub page.
Partner – Microsoft – NPI EA (cat= Spring Boot)
Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.
Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.
To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.
And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.
Partner – Orkes – NPI EA (cat=Spring)
Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.
Orkes is the leading workflow orchestration platformbuilt to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.
With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.
Try a 14-Day Free Trial of Orkes Conductor today.
Partner – Orkes – NPI EA (tag=Microservices)
Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.
Orkes is the leading workflow orchestration platformbuilt to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.
With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.
Try a 14-Day Free Trial of Orkes Conductor today.
eBook – Guide Spring Cloud – NPI EA (cat=Spring Cloud)
eBook – Mockito – NPI EA (tag = Mockito)
Mocking is an essential part of unit testing, and the Mockito library makes it easy to write clean and intuitive unit tests for your Java code.
Get started with mocking and improve your application tests using our Mockito guide:
eBook – Java Concurrency – NPI EA (cat=Java Concurrency)
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
Get started with understanding multi-threaded applications with our Java Concurrency guide:
eBook – Reactive – NPI EA (cat=Reactive)
Spring 5 added support for reactive programming with the Spring WebFlux module, which has been improved upon ever since. Get started with the Reactor project basics and reactive programming in Spring Boot:
>> Join Pro and download the eBook
eBook – Java Streams – NPI EA (cat=Java Streams)
Since its introduction in Java 8, the Stream API has become a staple of Java development. The basic operations like iterating, filtering, mapping sequences of elements are deceptively simple to use.
But these can also be overused and fall into some common pitfalls.
To get a better understanding on how Streams work and how to combine them with other language features, check out our guide to Java Streams:
>> Join Pro and download the eBook
eBook – Jackson – NPI EA (cat=Jackson)
eBook – HTTP Client – NPI EA (cat=Http Client-Side)
eBook – Maven – NPI EA (cat = Maven)
eBook – Persistence – NPI EA (cat=Persistence)
eBook – RwS – NPI EA (cat=Spring MVC)
Course – LS – NPI EA (cat=Jackson)
Get started with Spring and Spring Boot, through the Learn Spring course:
Course – RWSB – NPI EA (cat=REST)
Course – LSS – NPI EA (cat=Spring Security)
Yes, Spring Security can be complex, from the more advanced functionality within the Core to the deep OAuth support in the framework.
I built the security material as two full courses - Core and OAuth, to get practical with these more complex scenarios. We explore when and how to use each feature and code through it on the backing project.
You can explore the course here:
Course – All Access – NPI EA (cat= Spring)
All Access is finally out, with all of my Spring courses. Learn JUnit is out as well, and Learn Maven is coming fast. And, of course, quite a bit more affordable. Finally.
Course – LSD – NPI EA (tag=Spring Data JPA)
Spring Data JPA is a great way to handle the complexity of JPA with the powerful simplicity of Spring Boot.
Get started with Spring Data JPA through the guided reference course:
Partner – LambdaTest – NPI EA (cat=Testing)
End-to-end testing is a very useful method to make sure that your application works as intended. This highlights issues in the overall functionality of the software, that the unit and integration test stages may miss.
Playwright is an easy-to-use, but powerful tool thatautomates end-to-end testing, and supports all modern browsers and platforms.
When coupled with LambdaTest (an AI-powered cloud-based test execution platform) it can be further scaled to run the Playwright scripts in parallel across 3000+ browser and device combinations:
>> Automated End-to-End Testing With Playwright
Course – Spring Sale 2025 – NPI EA (cat= Baeldung)
Yes, we're now running our Spring Sale. All Courses are25% off until 26th May, 2025:
Course – Spring Sale 2025 – NPI (cat=Baeldung)
Yes, we're now running our Spring Sale. All Courses are25% off until 26th May, 2025:
1. Introduction
The Vector API, which is an incubator API in the Java ecosystem, is used to express vector computations within Java on supported CPU architectures. It aims to provide performance gains on vector computations that are superior to the equivalent scalar alternative.
In Java 19, a fourth round of incubation was proposed for the Vector API as part of JEP 426.
In this tutorial, we’ll explore the Vector API, its associated terminologies, and how we can leverage the API.
2. Scalars, Vectors, and Parallelism
Understanding the idea of scalars and vectors in CPU operations is important before diving deep into Vector API.
2.1. Processing Units and CPU
A CPU utilizes a bunch of processing units to perform the operations. A processing unit can compute only one value at a time by operating. This value is called a scalar value, as it is just that, a value. An operation can either be a unary operation, which operates on a single operand, or a binary operation, which operates on two. Incrementing a number by 1 is an example of a unary operation, whereas adding two numbers is a binary operation.
A processing unit takes a certain amount of time to perform these operations. We measure time in cycles. The processing unit might take 0 cycles to perform an operation and many cycles to perform another, such as adding numbers.
2.2. Parallelism
A conventional modern CPU has multiple cores, and each core houses multiple processing units which are capable of performing operations. This provides the ability to execute operations on these processing units at the same time in parallel. We can have several threads running their programs in their cores, we get parallel execution of operations.
When we have a massive calculation, such as adding huge numbers from a massive data source, we can split the data into smaller chunks of data and distribute them among several threads and hopefully, we will get faster processing. This is one of the ways to do parallel computing.
2.3. SIMD Processors
We can do parallel computing differently by using what is called a SIMD processor. SIMD stands for Single Instruction Multiple Data. In these processors, there is no concept of multithreading. These SIMD processors rely on multiple processing units and these units perform the same operation in a single CPU cycle, i.e. at the same time. They share the program (instruction) that is executed but not the underlying data, hence the name. They have the same operation but operate on different operands.
Unlike how a processor loads a scalar value from memory, a SIMD machine loads an array of integers from memory onto the registers before operating. The way SIMD hardware is organized enables the load operation of the array of values to occur in a single cycle. SIMD machines allow us to perform computations on arrays in parallel without actually relying on concurrent programming.
Since a SIMD machine will see memory as an array, or a range of values, we call these a Vector, and any operation that a SIMD machine performs becomes a vector operation. Hence, this is a very powerful and efficient way to do parallel processing tasks by leveraging the principles of the SIMD architecture.
Now that we know what vectors are, let’s try to understand the basics of the Vector API that are provided by Java. A Vector, in Java, is represented by the abstract class, Vector. Here, E is the boxed type of the following scalar primitive integer types (byte, short, int, long) and floating point types (float, double).
3.1. Shapes, Species, and Lanes
We only have a pre-defined space to store and work with a vector, which ranges from 64 to 512 bits as of now. Imagine, if we have a Vector of Integer values and we have 256 bits to store it, we will have 8 components in total. This is because the size of a primitive int value is 32 bits. These components are called lanes in the context of the Vector API.
The shape of the vector is the bit-wise size or the number of bits of a vector. A vector with a shape of 512 bits will have 16 lanes and can operate on 16 ints at a time, while a 64-bit one will have only 2. Here, we use the term lane to indicate the similarity of how data flows in lanes within a SIMD machine.
The species of the vector is the combination of the vector’s shape and datatype, such as int, float, etc. It is represented by VectorSpecies.
3.2. Lane Operations on Vectors
There are broadly two types of vector operations classified as lane-wise operations and cross-lane operations.
A lane-wise operation, as the name suggests, only performs a scalar operation on a single lane on one or more vectors at a time. These operations can combine one lane of a vector with a lane of a second vector, for instance, during an add operation.
On the other hand, a cross-lane operation can compute or modify data from different lanes of a vector. Sorting the components of a vector is an example of a cross-lane operation. Cross-lane operations can produce scalars or vectors of different shapes from the source vectors. Cross-lane operations can be further classified into permutation and reduction operations.
3.3. Hierarchy of the Vector API
The Vector class has six abstract subclasses for each of the six supporting types: ByteVector, ShortVector, IntVector, LongVector, FloatVector, and DoubleVector. Specific implementations are important with SIMD machines, which is why shape-specific subclasses further extend these classes for each type. For example Int128Vector, Int512Vector, etc.
4. Computations Using Vector API
Let’s finally look at some Vector API code. We’ll look at lane-wise and cross-lane operations in the upcoming sections.
4.1. Adding Two Arrays
We want to add two integer arrays and store the information in a third array. The traditional scalar way to do this would be:
public int[] addTwoScalarArrays(int[] arr1, int[] arr2) {
int[] result = new int[arr1.length];
for(int i = 0; i< arr1.length; i++) {
result[i] = arr1[i] + arr2[i];
}
return result;
}
Let’s now write the same code, the vector way. The Vector API packages are available under jdk.incubator.vector, which we need to import into our class.
Since we would be dealing with vectors, the very first thing we need to do is to create vectors from the two arrays. We use the fromArray() method of the Vector API for this step. This method requires us to provide the species of the vector that we want to create and the start offset of the array from where to begin the loading.
The offset would be 0 in our case, as we want to load the entire array from the start. We can use the default SPECIES_PREFERRED for our species, which uses the maximal bit size suitable for its platform:
static final VectorSpecies<Integer> SPECIES = IntVector.SPECIES_PREFERRED;
var v1 = IntVector.fromArray(SPECIES, arr1, 0);
var v2 = IntVector.fromArray(SPECIES, arr2, 0);
Once we have the two vectors from the array, we use the add() method on one of the vectors by passing the second vector:
var result = v1.add(v2);
Finally, we convert the vector result into an array and return:
public int[] addTwoVectorArrays(int[] arr1, int[] arr2) {
var v1 = IntVector.fromArray(SPECIES, arr1, 0);
var v2 = IntVector.fromArray(SPECIES, arr2, 0);
var result = v1.add(v2);
return result.toArray();
}
Considering the above code ran on a SIMD machine, the add operation adds all the lanes of the two vectors in the same CPU cycle.
4.2. VectorMasks
The code demonstrated above comes with its limitations as well. It runs well and provides the advertised performance only if the number of lanes matches the size of the vectors the SIMD machine can handle. This introduces us to the idea of using vector masks, represented by VectorMasks, which is like a boolean value array. We take the help of VectorMasks when we are unable to fill the entire input data into our vector.
A mask selects the lane to which an operation is to be applied. The operation is applied if the corresponding value in the lane is true, or a different fallback action is performed if it is false.
These masks help us perform operations independent of the vector shape and size. We can use the predefined length() method, which will return the shape of the vector at runtime.
Here’s a slightly modified code with masks to help us iterate over the input arrays in strides of the vector length and then do a tail cleanup:
public int[] addTwoVectorsWithMasks(int[] arr1, int[] arr2) {
int[] finalResult = new int[arr1.length];
int i = 0;
for (; i < SPECIES.loopBound(arr1.length); i += SPECIES.length()) {
var mask = SPECIES.indexInRange(i, arr1.length);
var v1 = IntVector.fromArray(SPECIES, arr1, i, mask);
var v2 = IntVector.fromArray(SPECIES, arr2, i, mask);
var result = v1.add(v2, mask);
result.intoArray(finalResult, i, mask);
}
// tail cleanup loop
for (; i < arr1.length; i++) {
finalResult[i] = arr1[i] + arr2[i];
}
return finalResult;
}
This code is now much safer to execute and runs independently of the shape of the vector.
4.3. Computing the Norm of a Vector
In this section, we look at another simple mathematical calculation, the normal of two values. The norm is the value we get when we add the squares of two values and then perform a square root of the sum.
Let’s see what the scalar operation looks like first:
public float[] scalarNormOfTwoArrays(float[] arr1, float[] arr2) {
float[] finalResult = new float[arr1.length];
for (int i = 0; i < arr1.length; i++) {
finalResult[i] = (float) Math.sqrt(arr1[i] * arr1[i] + arr2[i] * arr2[i]);
}
return finalResult;
}
We’ll now try to write the vector alternative to the above code.
First, we obtain our preferred species of type FloatVector which is optimal in this scenario:
static final VectorSpecies<Float> PREFERRED_SPECIES = FloatVector.SPECIES_PREFERRED;
We’ll use the concept of masks, as we discussed in the previous section in this example. Our loop runs till the loopBound value of the first array and does so in strides of the species length. In each step, we load the float value into a vector and perform the same mathematical operation as we did in our scalar version.
Finally, we perform a tail clean-up with an ordinary scalar loop on the leftover elements. The final code is quite similar to our previous example:
public float[] vectorNormalForm(float[] arr1, float[] arr2) {
float[] finalResult = new float[arr1.length];
int i = 0;
int upperBound = SPECIES.loopBound(arr1.length);
for (; i < upperBound; i += SPECIES.length()) {
var va = FloatVector.fromArray(PREFERRED_SPECIES, arr1, i);
var vb = FloatVector.fromArray(PREFERRED_SPECIES, arr2, i);
var vc = va.mul(va)
.add(vb.mul(vb))
.sqrt();
vc.intoArray(finalResult, i);
}
// tail cleanup
for (; i < arr1.length; i++) {
finalResult[i] = (float) Math.sqrt(arr1[i] * arr1[i] + arr2[i] * arr2[i]);
}
return finalResult;
}
4.4. Reduction Operation
Reduction operations in the Vector API refer to those operations that combine multiple elements of a vector into a single result. It allows us to perform calculations such as summing the elements of a vector or finding the maximum, minimum, and average value within the vector.
The Vector API provides multiple reduction operation capabilities that can leverage the SIMD architecture machines. Some common APIs include the following:
- reduceLanes(): This method takes in a mathematical operation, such as ADD, and combines all elements of the vector into a single value
- reduceAll(): This method is similar to the above, except that, this expects a binary reduction operation that can take two values and output a single value
- reduceLaneWise(): This method reduces the elements in a specific lane and produces a vector with a reduced lane value.
We’ll see an example to compute the average of a vector.
We can use the reduceLanes(ADD) to compute the sum of all the elements and then perform a scalar division by the length of the array:
public double averageOfaVector(int[] arr) {
double sum = 0;
for (int i = 0; i< arr.length; i += SPECIES.length()) {
var mask = SPECIES.indexInRange(i, arr.length);
var V = IntVector.fromArray(SPECIES, arr, i, mask);
sum += V.reduceLanes(VectorOperators.ADD, mask);
}
return sum / arr.length;
}
5. Caveats Associated With Vector API
While we can appreciate Vector API’s benefits, we should accept it with a pinch of salt. Firstly, this API is still in the incubation phase. There is, however, a plan to have vector classes declared as primitive classes.
As mentioned above, the Vector API has a hardware dependency as it relies on SIMD instructions. Many of the features may not be available on other platforms and architectures. Moreover, there is always an overhead of maintaining vectorized operations over traditional scalar ones.
It is also difficult to perform benchmark comparisons of vector operations on generic hardware without knowing the underlying architecture. However, the JEP provides some guidance on doing this.
6. Conclusion
The benefits of using the Vector API, albeit cautiously, are tremendous. The performance gains and the simplified vectorization of operations provide benefits to the graphics industry, large-scale computation, and many more. We looked at the important terminologies associated with the Vector API. We also dived deep into some code examples as well.
The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.