Parallel Counting Sort: A Modified of Counting Sort Algorithm

: Sorting is one of a classic problem in computer engineer. One well-known sorting algorithm is a Counting Sort algorithm. Counting Sort had one problem, it can’t sort a positive and negative number in the same input list. Then, Modified Counting Sort created to solve that’s problem. The algorithm will split the numbers before the sorting process begin. This paper will tell another modification of this algorithm. The algorithm called Parallel Counting Sort. Parallel Counting Sort able to increase the execution time about 70% from Modified Counting Sort, especially in a big dataset (around 1000 and 10.000 numbers).


Parallel Counting Sort: A Modified of Counting Sort Algorithm
Pratyaksa Ocsa Nugraha Saian a a Fakultas Teknologi Informasi, Universitas Kristen Satya Wacana

Introduction
Sorting is one of many classic problems in a computer engineering.Although sorting usually being used in computer engineer, but sorting used in another field of study too.For example, sorting can be implemented in education [1] [2], in biology [3], or even in economy [4][5] field of study.In computer engineering itself, sorting process can be used in many ways, such as network engineering [6] [7], big data process [8] [9], or in database process [10].
An algorithm is an object which created with a purpose to solve any problem in given circumstances [11].Basically, a sorting algorithm is an algorithm to do the sorting process.Sorting algorithms had been created by many researchers.As for now, there are many algorithms already presented by them, such as Bubble Sort [12], Quick Sort, Merge Sort [13] and many more.Like two sides of a coin, that algorithms always have an advantage and a disadvantage for each one of them.
While many algorithms already presented, deciding which algorithm to be used is not that easy [14].There are many consequences when choosing the wrong one.It can affect memory usage and increase the execution time of the application.Not only that, choosing which hardware to do the sorting process is something crucial too.Research [15] tells there is a significant difference between using high-end Central Processing Unit (CPU) and Graphic Processing Unit (GPU).GPU able to run 20 times faster than high-end CPU, but usually GPU is more expensive than high-end CPU.
In 2009, Cormen et al. present a new sorting algorithm in their book.The algorithm didn't use comparing method to get the sorted list.Instead, the algorithm will count the appearance of the value in the list.Therefore, the algorithm called Counting Sort algorithm [16].Like any sorting algorithm, Counting Sort will have a list (usually an array) of integer number or character and the algorithm will try to arrange it in any given order (ascending or descending).Counting Sort assumes every element in the array contain a number from zero to n where n is a positive integer number.It makes counting sort algorithm can't sort both negative and positive number in one array.In another research [17], it can be solved by dividing the negative, zero, and positive number into different arrays and then the arrays will be sorted one by one.Then, these arrays will merge into one big array which will have a sorted number.Later in this paper, this algorithm called Modified the sorting process because there are at least two process runs at the same time.
The rest of the paper is structured as follows: Section 2 tells about several theories and previous work which related with this paper, Section 3 tells about the modified algorithm, and Section 4 tells about the conclusion and future works related to this paper.

The Material and Method
Before discussing more the modified algorithm, there are some theory or material which important and need to be discussed.All [18].
Another research conducted by Dwi M J Purnomo, et al. tells about an implementing a Bubble Sort in Field Programmable Gate Array (FPGA).The Bubble Sort itself implemented in both serial and parallel programming.They measure the memory usage and execution time.It appears that serial Bubble Sort have better memory usage than parallel Bubble Sort, but parallel Bubble Sort have better execution time than serial Bubble Sort [19].
Another research conducted by Ivan Kamarov, et al. tells about implementation of brute force algorithm to create k-Nearest Neighbor Graph (k-NNG).Then, this algorithm implemented into Graphics Processing Unit (GPU) combined with a quicksort algorithm.The result of this experiment is a combination of a brute force algorithm and quicksort algorithm in GPU able to process larger data in a better time execution [20].
To give a better understanding of what is this paper about, it is important to know about some theory.This paper will explain more about the sorting algorithm, counting sort algorithm, and parallel counting sort algorithm.

Sorting Algorithm
In computer science, a sorting algorithm is an algorithm to rearrange some list in a specific order.
The list can be an array, a vector, or any data type that can be stored more than one element at the same place.Integer or Char data type is commonly used in any sorting process.Sorting algorithm will always produce an arranged list in ascending or descending order.Ascending list is a list which its values are come from "a small number" to "a big number" while descending list is a list which its values come from "a big number" to "a small number".Sorted List (descending) Fig. 1 Example of an unsorted list, sorted list (ascending and descending) Fig. 1 is an example of an input and an output of the sorting algorithm.The unsorted list contains several elements of number (ex: 5, 2, 3, 4, 1, 7) and that list has values in random ordernot in ascending or descending order.In some cases, that list needs to be arranged properly to get a better information.That is how any sorting algorithm works.That list will be arranged by any sorting algorithm, then the result will always in a good order.It can be an ascending order (1, 2, 3, 4, 5, 7) or descending order (7,5,4,3,2,1).
Researcher tends to measure how good any sorting algorithm is.They usually consider it from several things, such as running time/execution time or how much memory needed to do the sort process.In this paper, only the running time/execution time is chosen to be a benchmark for any algorithm to be tested.

Counting Sort Algorithm
Counting Sort algorithm always starts with one list of an unordered integer numbers (List A).Then, it will create another list to save of how many times the number appears in the List A (List B).After both of lists successfully created, the algorithm will do the counting process.It will go through in each element of List A to count the appearance number and save it in List B. Now, every element in List B contains a number and that number is the "correct position" of the number in List A. Finally, the algorithm will create one last list (List C) to save the "correct position" of the number in List A. The algorithm will match each of numbers in List A with its position in List B and save it in List C. Implementation of the Counting Sort algorithm can be seen in Fig. 2.

Fig. 2 Pseudocode of Counting Sort Algorithm
To measure how good this counting sort algorithm, like any algorithm it will be used the time complexity of the algorithm.The time complexity of a Counting Sort algorithm is O(n + k) [16] where n is the number of elements in an array and k is the range of the input.The range of the input is the range between the smallest number and the biggest number in List A.

Parallel Counting Sort Algorithm
As explained in Section 1 before, the problem of counting sort appears when there are a negative integer value appears in the List A of Counting Sort.This problem can be solved by split the list into a negative list and a positive list.The flowchart of this process can be seen in Fig. 3.

Fig. 3 Flowchart of Modified Counting Sort Algorithm
From flowchart in Fig. 3 it tells that the splitting process to distinguish between a positive number and a negative number happen before the sorting process.Every element in unordered list will be checked if the number is greater than zero or not.If the number is greater than zero, then it will be stored in the "ArrPositive" list and if the number is smaller than zero, then it will be stored in the "ArrNegative" list.Both of this will sort separately and the result of both will joined into one list again.The implementation of this process can be seen in Fig. 4.

Fig. 4 Pseudocode of Modified Counting Sort
This paper will tell another modification of this algorithm.In the Modified Counting Sort before, after the input list separated into "ArrPositive" and "ArrNegative", the sorting process run to both separately too.This sorting process runs in sequentially.Usually "ArrPositive" will be sorted first and "ArrNegative" next.Instead of works in two lists sequentially, this new algorithm will do the counting sort simultaneously.The detailed process of this algorithm can be seen in Fig. 5

Fig. 6 Pseudocode of Parallel Counting Sort
The main difference between Modified Counting Sort and Parallel Counting Sort is in the thread creation.This thread has never been created in Modified Counting Sort, but in Parallel Counting Sort, it will create two new threads.These two threads used to enable the computer to do any process simultaneously.One thread will handle the sorting process for "ArrPositive" and another thread will handle "ArrNegative".The algorithm will wait until both threads finished do the sorting process, then the result will be merged into one list.

Result and Discussion
This section mainly talks about the testing process and how to compare the result of how well both algorithms to solve a sorting problem in many test cases.The Modified Counting Sort Algorithm and Parallel Counting Sort will be tested in a similar condition.Testing process held in a computer with hardware specification: Intel Core i5-3210M CPU @2.50GHz and 4GB RAM.The computer uses an operating system: Windows 10 Education 64-bit (10.0,Build 17134).
The testing process proceeds in three steps: (1) preparing test cases, (2) running the algorithm with the prepared test cases, and (3) getting the execution time.

Preparing Test Cases
There are some test cases prepared to measure the execution time of each algorithm.Both will get different input numbers, from ten, a hundred, a thousand, a ten thousand, and a hundred thousand of integer numbers.It contains positive numbers, more than one zero, and negative numbers which all of them will be generated randomly.

Fig. 7 C++ source code to generate random number
Error! Reference source not found.tells about a source code of the implementation of generating random numbers.It started with preparing an array (or a list) to be used as input numbers later.Then, by using C++ standard library function (all of them included in "random" header), the numbers generated one by one until all places in input numbers filled.Then, this function needs a little modification to gain control of "how random the generated number".The modification is by putting a control variable (MAX_RANGE) so the random

Running the Algorithm
Each set of randomly generated number from the previous step will be used for each algorithm as an input.To maintain the validity, both algorithm will use the same set of randomly generated number.

Getting the Execution Time
The last step of the testing process is getting the execution time of both algorithms.It will be used to measure the difference between them and decide which algorithm have a better execution time.In this experiment, the C++ programming language is used to get the execution time.The implementation of how to get the execution time is shown in Fig. 8. Fig. 7 C++ source code to take execution time Fig. 7 shows to get the execution time, in C++ use clock() function.This function is a "prepared" function and can be used by including "time.h"header.Then, the timer will be started when the algorithm about to started and finished when the algorithm finished too.The exact execution time obtained by finding the difference between start time and finish time.By doing this, the execution time will appear in milliseconds (ms).

Result
After doing the testing process in all test cases, the result of this experiment can be found in Table 1.

Fig. 8 Execution Time Chart
Fig. 8 showing the result of the experiment on both algorithm.The results show that in a relatively small set of randomly generated number (10, 100, and 1000 numbers) the result doesn't show a big difference.The gap distance starting to get wider after the algorithm get a big set of data (10.000 and 100.000 numbers) as an input.As described in the chart, more data being used, the gap gets wider too.

Conclusion
Based on the result explained before, Parallel Counting Sort able to have smaller execution time than Modified Counting Sort, especially in a big set of data.Parallel Counting Sort able to increase the execution time around 78.57% time in test case number 4 and around 76% in test case number 5. In a small set of data, the result tends to be the same since the execution time almost similar.
For the future works, Parallel Counting Sort needs to be compared with another sorting algorithm.To be more interesting, instead only comparing the execution time, the algorithm also comparing memory usages of each algorithm.

Fig. 5
Fig. 5 Flowchart of Parallel Counting Sort Algorithm Fig. 5 shows about the difference (marked with dotted line) from Modified Counting Sort Algorithm.Parallel Counting Sort will do the sorting process simultaneously.By doing it, Parallel Counting Sort should run faster than Modified Counting Sort Algorithm.The implementation of Parallel Counting Sort algorithm can be seen in Fig. 6.

Table 1
Execution Time Result for Both Algorithm

Table 1
shows the execution time of both algorithms in each test cases.From test case 1: both algorithms get the same results, they need 1ms to sort 10 different numbers.From test case 2: Modified Counting Sort and Parallel Counting Sort run in 2ms.From test case 3: there is a slight difference between Modified Counting Sort and Parallel Counting Sort, the difference only 1ms.From test case 4: there is a significant gap between them.Parallel Counting Sort only needs 18ms while Modified Counting Sort needs 84ms to finish the sorting process.From test case 5: the gap gets wider; Modified Counting Sort need 671ms while Parallel Counting sort needs 161ms.