SPO-Project concepts and tools.(Part 2)
In this blog I will discuss the concept of Auto vectorization, SIMD, SVE and SVE2. This are the core concept of software optimization.
Auto-vectorization:
In early days, computers used to have one logic unit that
was capable of executing one instruction on one pair of operands at a given
time. For this reason computer programs and languages were built to execute sequentially.
However modern computers have the capability to perform many task at a time.
There are many optimizing compilers who perform automatic vectorization which
enables to do some parallel operations where possible instead of only
sequential operations. This concept is called vector implementation which can process
one operation in multiple pair of operands at a given time.
For AArch64 system there are three extensions for the Auto
vectorization. These are SIMD, SVE, SVE2.
For Auto vectorization to be in effect the flags need to be used. This includes –O3, -ftree-vectorize etc.
For more detailed explanation with examples of codes you can follow this linkSIMD:
SIMD is one type of extension for auto vectorization. It
stands for Single instruction multiple data. As the name suggest it enables processing
of multiple data with a single instruction instead of the conventional
sequential approach where one data is process at a time. SIMD operations cannot
be used while processing multiple data in different ways.
We can build the program on armv8 system using the following
command:
gcc –g –O3 –c march=armv8-a
SVE and SVE2:
SVE stands for Scalable vector extension. Whereas SVE2 is
just the armv9 extension of it which is practically not available to any system
as of now. SVE is a new SIMD instruction set which is used as an extension to
AArch64 in order to allow flexible vector length implementation. SVE2 is the
combination of SVE and Neon. SVE2 has more functional domain in terms of data
level parallelism.
The main difference between SVE and SVE2 is in functional
coverage of the instruction set. While SVE was designed for HPC and ML applications,
SVE2 has the capability of data processing beyond this applications.
To use the SVE capability the following command should be
used:
gcc –g –O3 –c march=armv8-a+sve
To use the SVE2 capability the following command should be
used:
gcc –g –O3 –c march=armv8-a+sve2
One interesting thing to note is that, for the older system which does not support SVE or SVE2 capability we can use emulator for the part only which requires that capability. To use the emulator we can use:
qemu-aarch64
For further detailed learning you can follow this link
Comments
Post a Comment