Naziur Rahman khan Fall 2022 SPO600 Project_Stage3
I have used Python auto vectorization tool to modify existing functions for different architecture and it calls the GCC compiler and pass the appropriate flags to enable auto vectorization for that specific code.
For example, to enable auto vectorization with the GCC compiler, the "-O3" flag can be used to enable aggressive optimization. Additionally, the "-march=armv8-a+sve","-march=armv8-a+sve2","-march=armv8-a" flag can be used to enable vectorization for the specific type of CPU being used, such as SVE, SVE2, or ASIMD.
In summary, auto vectorization is a useful technique for improving the performance of programs on ARM CPUs with SVE, SVE2, and ASIMD instructions. By using the GCC compiler and the appropriate flags, it is possible to enable auto vectorization in the C programming language. This can help to optimize programs and take advantage of the SIMD capabilities of modern ARM processors.
I have completed the stage2 for the auto vectorization tool which can be found here: https://nkhan170.blogspot.com/2022/12/spo-project-stage2.html
According the remarks on that stage2 about the input function location, I have updated the code so that it can read the function.c file from any location.
At the stage2 code, I am passing the main argument like this:
$python autovectorization_tool.py --inputfile function.c
STAGE 3 Updates:
- I have solved the directory constraint of the function.c file
- I have improved the python tool so that it can work with multiple functions
- I also have improved the tool so that it can work with any function return type not just void type
The issue with the stage2 code was that the "function.c" file has to be on the same directory as the python tool directory. So, if I try to use an argument like this, it gives the following error (shown in the picture below):
$ python3 autovectorization_tool.py --inputfile /home/nkhan170/spo600_stage_2/SPO-600/function.c
The reason is that, I have considered the "function.c" to be in the current directory and for that it is not getting that file from the inputfile argument.
In order to solve the issue, I modified the code so that the code can read the directory and filename separately using the "os.path" library and then I used the "os.path.join()" to get the full path of the function name.
After modifying the code, our code successfully ran for the "function.c" in any directory:
As per the stage3 improvements, I have improved the code so that it can work with multiple functions defined in the function.c file
Contents of the "function.c" file:
In order to test the code I have to modify the main.c file to see if it is working with the new added functions or not. I have added two new functions:
int merge_channels(unsigned char *image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor);
void combine_channels(unsigned char *image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor);
I have to add this two functions' call in the "main.c" file to check if they are working or not.
I also need to add this two functions' definitions in the "adjust_channels.h" file so that they can be detected by the main.c file
Contents of the "adjust_channels.h" file:
Contents of the he "main.c" file:
Now after all the modifications, I ran the tool using the following command:
python3 autovectorization_tool.py --inputfile /home/nkhan170/spo600_stage_2/SPO-600/function.c
The output looks fine:
./main tests/input/bree.jpg 1.0 3.0 5.0 tests/output/breelaa.jpg
Using asimd_adjust_channels() implementation #1 - Naive (autovectorizable)Using asimd_merge_channels() implementation #1 - Naive (autovectorizable)Using asimd_combine_channels() implementation #1 - Naive (autovectorizable)
qemu-aarch64 ./main tests/input/bree.jpg 1.0 3.0 5.0 tests/output/breelaa.jpg
The output shows that it ran all the three functions properly using SVE2 build
Using sve2_adjust_channels() implementation #1 - Naive (autovectorizable)
Using sve2_merge_channels() implementation #1 - Naive (autovectorizable)
Using sve2_combine_channels() implementation #1 - Naive (autovectorizable)
I have put all the updated files in the github repo so that you can take a look and run the final autovectorization tool. The tool can be found here: