Parallel merge-sort leveraging futures in Racket

Written by Dominik Joe Pantůček on October 10, 2019.

Computers can perform more and more operations per second but that does not make them much faster without any further effort. In the past, the clock speed was the best measure of processor’s performance. With multi-core and multi-processor systems, this is no longer true and the programmers need to pay attention how to access computers’ performance. Read on to find out how in Racket – a modern Scheme dialect – it is quite easy to leverage parallel processing to speed sorting up.

This story began a few months ago when we were given a task to generate all possible combinations of numbers, calculate their sums and sort these sums from largest to smallest. Sounds like rather simple combinatorics and computer science task. The problem is the algorithmic time complexity and memory requirements. For 10 numbers that means $2^{10}=1024$ combinations, for 20 it is $2^{20}=1048576$, and for 27 – which was the number count we were faced with – it is $2^{27}=134217728$.

To make things worse, in order to sort these numbers, not only it is necessary to generate them in given number of steps, but it is also necessary to store them in memory. That means – on a 64bit platform – $8\times$ that number of bytes. Sounds exactly like 1GB in this case ($2^{30}$ that is). And after storing them, the most effective way to sort them is to use merge sort which has algorithmic time complexity of $O(n\cdot\log_2n)$ and space complexity $O(n)$.

The question is – can we do better? Luckily, the answer is: yes, we can! Merge sort[1] is a single-thread algorithm which can be (partially) parallelized. And it is not even that hard to make it run in parallel. Another question might be – can you do the same? And again, the answer is: yes, you can.

As you might have expected, the actual program we produced is written in Racket[2]. And Racket has nice system of packages and therefore we release the core of our parallel merge-sort for generic data types and specific optimized version for fixnums[3] under MIT license[4]. You can get it installed easily:

raco pkg install futures-sort

And the usage is pretty straightforward too:

#lang racket
(require futures-sort)
(define test-vector (for/vector ((i 1000)) (random 1000000)))
(vector-futures-sort! test-vector)

Of course, the advantages are more visible for much larger vector sizes than in this example.

By default the code tries to use all available cores and hyper-threads[5] as reported by Racket. You can find detailed description and configuration possibilities in the documentation that is readily available online or with your package installation.

Now the question remains – how does it really perform? Without any parallelism it does perform slightly better on Racket 3m variant and slightly but measurably worse on CS variant. With at least 2 threads at its disposal, it beats all competition by a huge margin.

Figure 1: Performance on ThinkPad x280 – up to 8 parallel threads.

Depicted here in Figure 1 is the performance measurement on Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz running Linux. As the depth is the depth of a binary tree of threads, the maximum level of parallelism is always $2^{depth}$ here.

Figure 2: Performance on VM with 16 VCPUs

And here in Figure 2 you can see the same measurement on VM[6] with 16 VCPUs running on 16-core / 32-HT Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz.

 

We really enjoyed working on highly academic topic with practical usage and we are happy to release the code under a very permissive license. Hope you like it too – and see you next Thursday with more Cryptoucan™ information again!


References

1. Wikipedia contributors. (2019, September 26). Merge sort. In Wikipedia, The Free Encyclopedia. Retrieved 14:48, October 9, 2019, from https://en.wikipedia.org/w/index.php?title=Merge_sort&oldid=917936545

2. Racket: solve problems – make languages, available online at https://racket-lang.org/

3. Fixnums, available online at https://docs.racket-lang.org/reference/fixnums.html

4. Wikipedia contributors. (2019, September 23). MIT License. In Wikipedia, The Free Encyclopedia. Retrieved 14:52, October 9, 2019, from https://en.wikipedia.org/w/index.php?title=MIT_License&oldid=917322197

5. Wikipedia contributors. (2019, September 20). Hyper-threading. In Wikipedia, The Free Encyclopedia. Retrieved 14:52, October 9, 2019, from https://en.wikipedia.org/w/index.php?title=Hyper-threading&oldid=916720738

6. Wikipedia contributors. (2019, September 21). Virtual machine. In Wikipedia, The Free Encyclopedia. Retrieved 14:53, October 9, 2019, from https://en.wikipedia.org/w/index.php?title=Virtual_machine&oldid=916930883