Posts

Naziur Rahman khan Fall 2022 SPO600 Project_Stage3

Image
I have used Python auto vectorization tool to modify existing functions for different architecture and it calls the GCC compiler and pass the appropriate flags to enable auto vectorization for that specific code. For example, to enable auto vectorization with the GCC compiler, the "-O3" flag can be used to enable aggressive optimization. Additionally, the "-march=armv8-a+sve","-march=armv8-a+sve2","-march=armv8-a" flag can be used to enable vectorization for the specific type of CPU being used, such as SVE, SVE2, or ASIMD. In summary, auto vectorization is a useful technique for improving the performance of programs on ARM CPUs with SVE, SVE2, and ASIMD instructions. By using the GCC compiler and the appropriate flags, it is possible to enable auto vectorization in the C programming language. This can help to optimize programs and take advantage of the SIMD capabilities of modern ARM processors. I have completed the stage2 for the auto vectorizat...

SPO-600 64-bit Assembly Language Lab4-part2

Image
  Task 4: In our previous part we had run the loop for 10 times. So, we only needed a single digit. However in the next task we will work with the aarch64 system to write a code that will loop to 30. So we have 2 digits to work with for which I will use 2 registers. Quotient was calculated by udiv which was my first digit. On the other hand remainder was calculated by the msub which was stored in the second digit. Once calculated I stored both of them registered in their specific locations on the string by using the strb. You can see all my codes from my github repo:  https://github.com/myseneca-162912216/SPO-600/tree/main/Lab%204/Aarch64  Below is the output of my code with the leading zero: Task 5: In this task I have to edit the code in such a way that it will remove the unnecessary leading zero of the first 10 numbers. I accomplished it by using a label. A label has been added before the second writing the second digit so that the first digit can be skipped. cmp com...

Naziur Rahman khan Fall 2022 SPO600 Project_Stage2

Image
While I first approach this project I was kind of blank like where to start my first step. I previously had no exposure to work with low level coding neither C coding. Till now I am only comfortable with doing some mid level python coding. My project is to build a tool that can be written in any language whose main purpose is to take function from a single file and create three copy of it with the variation of the function for each auto vectorization type and also add an ifunc resolver to choose which one of those functions to use based on the particular system capability.  We got our main file from our professor's github repo that he shared with us. I copied the files to my assignment armv8 machine called israel from this link: https://github.com/ctyler/spo600-fall2022-project-test-code.git I have kept all my project related file in my repo which can be accessed from here https://github.com/myseneca-162912216/SPO-600/tree/main/spo600_stage_2 I have taken some help regarding proces...

SPO-Project concepts and tools.(Part 2)

In this blog I will discuss the concept of Auto vectorization, SIMD, SVE and SVE2. This are the core concept of software optimization. Auto-vectorization: In early days, computers used to have one logic unit that was capable of executing one instruction on one pair of operands at a given time. For this reason computer programs and languages were built to execute sequentially. However modern computers have the capability to perform many task at a time. There are many optimizing compilers who perform automatic vectorization which enables to do some parallel operations where possible instead of only sequential operations. This concept is called vector implementation which can process one operation in multiple pair of operands at a given time. For AArch64 system there are three extensions for the Auto vectorization. These are SIMD, SVE, SVE2. For Auto vectorization to be in effect the flags need to be used. This includes –O3, -ftree-vectorize etc. For more detailed explanation wi...

SPO-Project concepts and tools.(Part 1)

Through the next couple of days, I will publish a series of blogs about different tools and concepts that will be useful for me to implement the final project of my SPO600 course. In this blog I will talk about GCC and how does it work. GCC Compiler: GCC stands for GNU Compiler Collection which is an optimizing compiler and it is a product of GNU project. It supports various programming languages, hardware architectures and operating systems. It is a key component in the GNU toolchain. Most of the projects related to GNU and the Linux kernel uses GCC as their compiler. This has been made available as a free software.   GCC working pr­inciple : GCC is a toolchain which compiles and links code with any of the library dependencies and then converts that code to assembly language. After that it prepares executable files from it. GCC follows the standard UNIX design mechanism which uses simple tools to perform individual tasks well. When we run GCC with a source code file, at fi...

Naziur Rahman khan Fall 2022 SPO600 Project (Algorithm-part2)

                                          Implementation of ifunc-capable Software in Python(Continue) The language to be used For this project, I will be using the Python language to discuss the implementation of ifunc-capable software. The ifunc capability is a feature that allows users to call a function in an executable file without actually having to write the code for it. This can help reduce time and effort on programming projects and make the process more efficient overall. In Python, this feature is accessed using two different functions: `__import__` and `__callable__`. These functions allow developers to create functions that can be called inside other files without requiring any coding or compilation. The first step towards implementing ifunc-capable software in Python is understanding how these two functions work and what they do. To begin, the `__impo...

Naziur Rahman khan Fall 2022 SPO600 Project (Algorithm-part1)

  Implementation of ifunc-capable Software in Python. Python interpreter has a great module called ifunc, which provides multi-threaded, compile-time function selection. It replaces some built-in functions with custom implementations at compile time. This allows the replacement function to specialize on a particular type or set of types. ifunc is implemented as an extension to the CPython interpreter and can be used with any language that can be compiled into Python bytecode, like Cython and PyPy. This blog post will show how useful it is to use ifunc-capable software in Python. We will also give some examples of code that uses ifunc. Plan for the project This project aims to develop an understanding of the implementation of ifunc-capable software in Python. If you're a Python programmer, then you know that ifunc is a powerful tool that can be used to great effect in your programs. However, implementing ifunc can be tricky, and it often needs to be clarified how to get starte...