Naziur Rahman khan Fall 2022 SPO600 Project_Stage3

I have used Python auto vectorization tool to modify existing functions for different architecture and it calls the GCC compiler and pass the appropriate flags to enable auto vectorization for that specific code.

For example, to enable auto vectorization with the GCC compiler, the "-O3" flag can be used to enable aggressive optimization. Additionally, the "-march=armv8-a+sve","-march=armv8-a+sve2","-march=armv8-a" flag can be used to enable vectorization for the specific type of CPU being used, such as SVE, SVE2, or ASIMD.

In summary, auto vectorization is a useful technique for improving the performance of programs on ARM CPUs with SVE, SVE2, and ASIMD instructions. By using the GCC compiler and the appropriate flags, it is possible to enable auto vectorization in the C programming language. This can help to optimize programs and take advantage of the SIMD capabilities of modern ARM processors.

I have completed the stage2 for the auto vectorization tool which can be found here: https://nkhan170.blogspot.com/2022/12/spo-project-stage2.html

According the remarks on that stage2 about the input function location, I have updated the code so that it can read the function.c file from any location.

At the stage2 code, I am passing the main argument like this: 

    $python autovectorization_tool.py --inputfile function.c

STAGE 3 Updates:

  • I have solved the directory constraint of the function.c file
  • I have improved the python tool so that it can work with multiple functions
  • I also have improved the tool so that it can work with any function return type not just void type  

 

The issue with the stage2 code was that the "function.c"  file has to be on the same directory as the python tool directory. So, if I try to use an argument like this, it gives the following error (shown in the picture below):

$ python3 autovectorization_tool.py --inputfile /home/nkhan170/spo600_stage_2/SPO-600/function.c


The reason is that, I have considered the "function.c" to be in the current directory and for that it is not getting that file from the inputfile argument.

Solution:

In order to solve the issue, I modified the code so that the code can read the directory and filename separately using the "os.path" library and then I used the "os.path.join()"  to get the full path of the function name.

After modifying the code, our code successfully ran for the "function.c" in any directory:



Here is the output of the generated files using the tool:


Next:

As per the stage3 improvements, I have improved the code so that it can work with multiple functions defined in the function.c file

Contents of the "function.cfile:

void adjust_channels(unsigned char *image, int x_size, int y_size,
        float red_factor, float green_factor, float blue_factor) {

        printf("Using adjust_channels() implementation #1 - Naive (autovectorizable)\n");
       
/*

        The image is stored in memory as pixels of 3 bytes, representing red/green/blue values.
        Each of these values is multiplied by the corresponding adjustment factor, with
        saturation, and then stored back to the original memory location.
       
        This simple implementation causes int to float to int conversions.
       
*/

        for (int i = 0; i < x_size * y_size * 3; i += 3) {
                image[i]   = MIN((float)image[i]   * red_factor,   255);
                image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
                image[i+2] = MIN((float)image[i+2] * green_factor, 255);
        }
}


int merge_channels(unsigned char *image, int x_size, int y_size,
        float red_factor, float green_factor, float blue_factor) {

        printf("Using merge_channels() implementation #1 - Naive (autovectorizable)\n");
       
/*

        The image is stored in memory as pixels of 3 bytes, representing red/green/blue values.
        Each of these values is multiplied by the corresponding adjustment factor, with
        saturation, and then stored back to the original memory location.
       
        This simple implementation causes int to float to int conversions.
       
*/

        for (int i = 0; i < x_size * y_size * 3; i += 3) {
                image[i]   = MIN((float)image[i]   * red_factor,   255);
                image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
                image[i+2] = MIN((float)image[i+2] * green_factor, 255);
        }
        return 0;
}

void combine_channels(unsigned char *image, int x_size, int y_size,
        float red_factor, float green_factor, float blue_factor) {

        printf("Using combine_channels() implementation #1 - Naive (autovectorizable)\n");
       
/*

        The image is stored in memory as pixels of 3 bytes, representing red/green/blue values.
        Each of these values is multiplied by the corresponding adjustment factor, with
        saturation, and then stored back to the original memory location.
       
        This simple implementation causes int to float to int conversions.
       
*/

        for (int i = 0; i < x_size * y_size * 3; i += 3) {
                image[i]   = MIN((float)image[i]   * red_factor,   255);
                image[i+1] = MIN((float)image[i+1] * blue_factor,  255);
                image[i+2] = MIN((float)image[i+2] * green_factor, 255);
        }
}

In order to test the code I have to modify the main.c file to see if it is working with the new added functions or not. I have added two new functions:

int merge_channels(unsigned char *image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor);

void combine_channels(unsigned char *image, int x_size, int y_size, float red_factor, float green_factor, float blue_factor);

I have to add this two functions' call in the "main.c" file to check if they are working or not.

I also need to add this two functions' definitions in the "adjust_channels.h" file so that they can be detected by the main.c file

Contents of the "adjust_channels.h" file:

// adjust_channels.h

#ifndef ADJUST_CHANNELS_H
#define ADJUST_CHANNELS_H

void adjust_channels(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor);

int merge_channels(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor);

void combine_channels(unsigned char *image, int x_size, int y_size,
    float red_factor, float green_factor, float blue_factor);
   
#endif


Contents of the he "main.c" file:

/*

  image-adjust
 
  (C)2022 Seneca College of Applied Arts and Technology.
  Written by Chris Tyler. Licensed under the terms of the GPL verion 2.
 
*/

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/param.h>

// adjust_channels is where all the real action is
// this file is just scaffolding!
#include "adjust_channels.h"

// Using the STBI image reader/writer
// See https://github.com/nothings/stb
#define STBI_NO_LINEAR
#define STBI_NO_HDR
#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include <stb_image_write.h>

int main(int argc, char *argv[]) {

    // ==================== Check arg count
    if (argc != 6) {
        dprintf(2, "\nUsage: %s input.jpg red green blue output.jpg\nWhere red/green/blue are in the range 0.0-2.0\n", argv[0]);
        return 1;
    }

    // ==================== Load the image file (arg 1)
    int x, y, n;
    unsigned char *image = stbi_load(argv[1],
        &x, &y, &n, 3);

    if (image == NULL) {
        dprintf(2, "Invalid argument or input image file did not load.\n");
        dprintf(2, "\nUsage: %s input.jpg red green blue output.jpg\nWhere red/green/blue are in the range 0.0-2.0\n", argv[0]);
        return 2;
    }
    printf("File '%s' loaded: %dx%d pixels, %d bytes per pixel.\n", argv[1], x, y, n);


   
    // ==================== Adjust the channels
   
    // Get arguments 2, 3, and 4; each should be a number in the range 0.0 .. 2.0
    // Yes this is ugly and should be improved, this is a quick & dirty test program :-)
    float redarg   = MIN(2, MAX(0, strtof(argv[2],NULL)));
    float greenarg = MIN(2, MAX(0, strtof(argv[3],NULL)));
    float bluearg  = MIN(2, MAX(0, strtof(argv[4],NULL)));
   
    printf("Adjustments:\tred: %8.6f   green: %8.6f   blue: %8.6f\n", redarg, greenarg, bluearg);
   
    adjust_channels(image, x, y, redarg, greenarg, bluearg);
    merge_channels(image, x, y, redarg, greenarg, bluearg);
    combine_channels(image, x, y, redarg, greenarg, bluearg);

    // ==================== Save the resulting file (jpg) (arg 5)
    stbi_write_jpg(argv[5], x, y, n, image, 90);
}


Now after all the modifications, I ran the tool using the following command:

python3 autovectorization_tool.py --inputfile /home/nkhan170/spo600_stage_2/SPO-600/function.c

The output looks fine:


 
Now if I try to run the compiled binary with the following command:

./main tests/input/bree.jpg 1.0 3.0 5.0 tests/output/breelaa.jpg



The output shows that it ran all the three functions properly using ASIMD build
Using asimd_adjust_channels() implementation #1 - Naive (autovectorizable)
Using asimd_merge_channels() implementation #1 - Naive (autovectorizable)
Using asimd_combine_channels() implementation #1 - Naive (autovectorizable)

or run the compiled binary with qemu-aarch64 emulator:
qemu-aarch64 ./main tests/input/bree.jpg 1.0 3.0 5.0 tests/output/breelaa.jpg


The output shows that it ran all the three functions properly using SVE2 build

Using sve2_adjust_channels() implementation #1 - Naive (autovectorizable)

Using sve2_merge_channels() implementation #1 - Naive (autovectorizable)

Using sve2_combine_channels() implementation #1 - Naive (autovectorizable)


I have put all the updated files in the github repo so that you can take a look and run the final autovectorization tool. The tool can be found here:  

https://github.com/myseneca-162912216/SPO-600/tree/main/spo600_stage_3

Comments

Popular posts from this blog

SPO600-Lab2 (Experimenting with 6502 emulator)

Naziur Rahman khan Fall 2022 SPO600 Project (Algorithm-part1)

SPO600-Lab 2 (Continued)