Tuesday, November 30, 2010

All you need to know about QuickSort

It would be true to say that Quicksort is one of the most popular sorting algorithms. You can find it implemented on the most of the languages and it is present in almost any core library. In Java and Go Quicksort is default sorting algorithm for some data types and it is used in in C++ STL (Introsoft which is used there begins with Quicksort). Such popularity can be explained by the fact that on average, Quicksort is one of the fastest known sorting algorithms. Interestingly that complexity of Quicksort is not less than it is for other algorithms like MergeSort or HeapSort. The best case performance is O(nlogn) and on the worst case it gives O(n^2). Latter, luckily, is exceptional case for the proper implementation. Quicksort performance is gained by the main loop which tends to make excellent usage of CPU caches. Another reason of popularity is that it doesn't need allocation of additional memory.

Personally for me Quicksort appeared as one of the most complex sorting algorithms. The basic idea is pretty simple and usually takes just a few minutes to implement. But that version, of course, if not practically usable. When it comes to details and to efficiency, it is getting more and more complicated.

Quicksort was first discovered by C.A.R. Hoare in 1962 (see "Quicksort," Computer Journal 5, 1, 1962) and in following years algorithm slightly mutated. The most known version is Three-way Quicksort. The most comprehensive of widely known ones is Dual-Pivot Quicksort. Both algorithms will be covered in that post.

The Java language was used to implement all algorithms. That post do not pretend to make adequate performance analysis. Test data used for performance comparison is incomplete and used just to show certain optimization techniques. Also, algorithm implementations are not necessary optimal. Just keep that in mind while you are reading.

Basics


The basic version of Quicksort is pretty simple and can be implemented just in few lines of code:

public static void basicQuickSort(long arr[], int beginIdx, int len) {
if ( len <= 1 )
return;

final int endIdx = beginIdx+len-1;

// Pivot selection
final int pivotPos = beginIdx+len/2;
final long pivot = arr[pivotPos];
Utils.swap(arr, pivotPos, endIdx);

// partitioning
int p = beginIdx;
for(int i = beginIdx; i != endIdx; ++i) {
if ( arr[i] <= pivot ) {
Utils.swap(arr, i, p++);
}
}
Utils.swap(arr, p, endIdx);

// recursive call
basicQuickSort(arr, beginIdx, p-beginIdx);
basicQuickSort(arr, p+1, endIdx-p);
}

The code looks pretty simple and easily readable. Pivot selection is trivial and doesn't require any explanation. The partitioning process can be illustrated using following figure:
Basic Quicksort
pointer "i" moves from the beginning to the end on array (note, that the last element of the array is skipped - we know that it the pivot). If i-th element is "<= pivot" then i-th and p-th elements are swapped and "p" pointer is moved to the next element. When partitioning is finished array will look like this: Basic Quicksort
Remember, that in the code, at the end of array there is element with pivot value, and that element is excluded from the pivoting loop. That element is put on p-th position, which makes p-th element included in "<= pivot" area. If you need more details, have a look at Wikipedia. There is pretty good explanation with lots of references. I would just emphasize your attention that algorithm consists of three main sections. These sections are pivot selection, partitioning and recursive call to sort partition. To make separation clearer the algorithm can be written down as:

public static void basicQuickSort(long arr[], int beginIdx, int len) {
if ( len <= 1 )
return;

final int endIdx = beginIdx + len - 1;
final int pivotIdx = getPivotIdx(arr, beginIdx, len);
final long pivot = arr[pivotIdx];

Utils.swap(arr, pivotIdx, endIdx);
int p = partition(arr, beginIdx, len, pivot);
Utils.swap(arr, p, endIdx);

basicQuickSort(arr, beginIdx, p-beginIdx);
basicQuickSort(arr, p+1, endIdx-p);
}

public static int partition(long[] arr, int beginIdx, int len, long pivot) {
final int endIdx = beginIdx + len - 1;
int p = beginIdx;
for(int i = beginIdx; i != endIdx; ++i) {
if ( arr[i] <= pivot ) {
Utils.swap(arr, i, p++);
}
}
return p;
}

public static int getPivotIdx(long arr[], int beginIdx, int len) {
return beginIdx+len/2;
}

Now let's have a look how it performs vs Java 1.6 sort algorithm. For the test I will generate array using following loop:

static Random rnd = new Random();
private static long[] generateData() {
long arr[] = new long[5000000];
for(int i = 0; i != arr.length; ++i) {
arr[i] = rnd.nextInt(arr.length);
}
return arr;
}

Then I run each JDK 6 Arrays.sort() and basicQuickSort() for 30 times and took the average run time as the result. New set of random data was generated for each run. The result if that exercise is this:
arr[i]=rnd.nextInt(arr.length)
Java 6 Arrays.sort1654ms
basicQuickSort1431ms

Not that bad. Now look what would happen if input data has some more repeated elements. To generated that data, I just divided nextInt() argument by 100:
arr[i]=rnd.nextInt(arr.length)arr[i]=rnd.nextInt(arr.length/100)
Java 6 Arrays.sort1654ms935ms
basicQuickSort1431ms2570ms

Now that is very bad. Obviously that simple algorithm doesn't behave well in such cases. It can be assumed that the problem is in the quality of the pivot. The worst possible pivot is the biggest or the smallest element of the array. In that case, algorithm would has O(n^2) complexity. Ideally pivot should be chosen such as it splits an array into two parts with equal sizes. It means that ideal pivot is the median on all values of given array. Practically that is not good idea - too slow. Therefore, usually, implementation uses median of 3-5 elements. The decision on the number of elements used for pivot can be based on the size of partitioned array. The code for the pivot selection may look like this:

public static int getPivotIdx(long arr[], int beginIdx, int len) {
if ( len <= 512 ) {
int p1 = beginIdx;
int p2 = beginIdx+(len>>>1);
int p3 = beginIdx+len-1;

if ( arr[p1] > arr[p2] ) { int tmp = p1; p1 = p2; p2 = tmp; }
if ( arr[p2] > arr[p3] ) { p2 = p3; }
if ( arr[p1] > arr[p2] ) { p2 = p1; }

return p2;
} else {
int p1 = beginIdx+(len/4);
int p2 = beginIdx+(len>>1);
int p3 = beginIdx+(len-len/4);
int p4 = beginIdx;
int p5 = beginIdx+len-1;

if ( arr[p1] > arr[p2] ) { int tmp = p1; p1 = p2; p2 = tmp; }
if ( arr[p2] > arr[p3] ) { int tmp = p2; p2 = p3; p3 = tmp; }
if ( arr[p1] > arr[p2] ) { int tmp = p1; p1 = p2; p2 = tmp; }
if ( arr[p3] > arr[p4] ) { int tmp = p3; p3 = p4; p4 = tmp; }
if ( arr[p2] > arr[p3] ) { int tmp = p2; p2 = p3; p3 = tmp; }
if ( arr[p1] > arr[p2] ) { p2 = p1; }
if ( arr[p4] > arr[p5] ) { p4 = p5; }
if ( arr[p3] > arr[p4] ) { p3 = p4; }
if ( arr[p2] > arr[p3] ) { p3 = p2; }
return p3;
}
}

Here are results after improvements in pivot selection strategy:
arr[i]=rnd.nextInt(arr.length)arr[i]=rnd.nextInt(arr.length/100)
Java 6 Arrays.sort1654ms935ms
basicQuickSort1431ms2570ms
basicQuickSort with 'better' pivot1365ms2482ms

Unfortunately, the improvement is almost nothing. It appeared that pivot selection is not the root cause of the problem. But still let's keep it, it doesn't harm, even helps a little bit. It also significantly reduce possibility of O(n^2) behaviour. Another suspect is the algorithm itself. It seems like it's not good enough. Obviously it doesn't perform well, when collection has repeated elements. Therefore something has to be changed.

Three-way partitioning


The way to get around that problem is three-way-partitioning. As a result of such partitioning, elements which are equal to the pivot are put in the middle of the array. Elements which are bigger than pivot are put in the right side of the array and ones which are smaller on the left side, appropriately.
Implementation of that partitioning method consists of two stages. In the first stage arrays is scanned by two pointers ("i" and "j") which are approaching in opposite directions. Elements which are equals to pivot are moved to the ends of array:

It can be seen that after the first stage elements which are equal to the pivot are located on the edges of the array. On the second stage these elements are moved to the middle. That is now their final position and they can be be excluded from the further sorting:

After implementation of such algorithm partitioning function is getting much more complicated. In that implementation the result of the partitioning is lengths of two bound partitions:

public static long partition(long[] arr, int beginIdx, int endIdx, long pivot) {
int i = beginIdx-1;
int l = i;
int j = endIdx+1;
int r = j;
while ( true ) {
while(arr[++i] <> pivot){}

if ( i >= j )
break;

Utils.swap(arr, i, j);
if ( arr[i] == pivot ) {
Utils.swap(arr, i, ++l);
}
if ( arr[j] == pivot ) {
Utils.swap(arr, j, --r);
}
}
// if i == j then arr[i] == arr[j] == pivot
if ( i == j ) {
++i;
--j;
}

final int lLen = j-l;
final int rLen = r-i;

final int pLen = l-beginIdx;
final int exchp = pLen > lLen ? lLen: pLen;
int pidx = beginIdx;
for(int s = 0; s <= exchp; ++s) {
Utils.swap(arr, pidx++, j--);
}
final int qLen = endIdx-r;
final int exchq = rLen > qLen ? qLen : rLen;
int qidx = endIdx;
for(int s = 0; s <= exchq; ++s) {
Utils.swap(arr, qidx--, i++);
}

return (((long)lLen)<<32)|rlen;
}

The pivot selection has to be changed as well, but more just for convenience, the idea remains absolutely the same. Now it returns actual value of pivot, instead of index:

public static long getPivot(long arr[], int beginIdx, int len) {
if ( len <= 512 ) {
long p1 = arr[beginIdx];
long p2 = arr[beginIdx+(len>>1)];
long p3 = arr[beginIdx+len-1];

return getMedian(p1, p2, p3);
} else {
long p1 = arr[beginIdx+(len/4)];
long p2 = arr[beginIdx+(len>>1)];
long p3 = arr[beginIdx+(len-len/4)];
long p4 = arr[beginIdx];
long p5 = arr[beginIdx+len-1];

return getMedian(p1, p2, p3, p4, p5);
}
}

And here is the main method, which is slightly changed as well:

public static void threeWayQuickSort(long[] arr, int beginIdx, int len) {
if ( len < 2 )
return;

final int endIdx = beginIdx+len-1;
final long pivot = getPivot(arr, beginIdx, len);
final long lengths = threeWayPartitioning(arr, beginIdx, endIdx, pivot);

final int lLen = (int)(lengths>>32);
final int rLen = (int)lengths;

threeWayQuickSort(arr, beginIdx, lLen);
threeWayQuickSort(arr, endIdx-rLen+1, rLen);
}

now let's compare it with Java 6 sort:
arr[i]=rnd.nextInt(arr.length)arr[i]=rnd.nextInt(arr.length/100)
Java 6 Arrays.sort1654ms935ms
basicQuickSort1431ms2570ms
basicQuickSort with 'better' pivot1365ms2482ms
Three-way partitioning Quicksort1330ms829ms

Huh, impressive! It is faster than standard library, which, by he way, implements the same algorithm. To be honest I was surprised, when found that it is such an easy task to beat standard library.
But what about making it even faster? There is one trick which always helps and it works for all sorting algorithms which work with consecutive memory. That trick is Insertion sort. Although is has big chance of O(n^2), it appears to be very very effective on the small arrays and always gives some performance improvements. Especially that is noticeable when input data is not sorted and there are not many repeated elements. All you need to do is just add it at the beginning of sorting method:

public static void threeWayQuickSort(long[] arr, int beginIdx, int len) {
if ( len < 2 )
return;

if ( len < 17 ) {
InsertionSort.sort(arr, beginIdx, len);
return;
}

final int endIdx = beginIdx+len-1;
final long pivot = getPivot(arr, beginIdx, len);
final long lengths = threeWayPartitioning(arr, beginIdx, endIdx, pivot);

final int lLen = (int)(lengths>>32);
final int rLen = (int)lengths;

threeWayQuickSort(arr, beginIdx, lLen);
threeWayQuickSort(arr, endIdx-rLen+1, rLen);
}

and run the test again:
arr[i]=rnd.nextInt(arr.length)arr[i]=rnd.nextInt(arr.length/100)
Java 6 Arrays.sort1654ms935ms
basicQuickSort1431ms2570ms
basicQuickSort with 'better' pivot1365ms2482ms
Three-way partitioning Quicksort1330ms829ms
Three-way partitioning Quicksort with Insertion sort1155ms818ms

now standard library looks just awful. It looks now that all is said and done. But it in reality that's not the end of the story and there is something else to talk about.

Dual-pivot Quicksort


Moving forward, I found that Java 7 is much more advanced and performs much faster than Java 6 version and outperforms all previous tests:
arr[i]=rnd.nextInt(arr.length)arr[i]=rnd.nextInt(arr.length/100)
Java 6 Arrays.sort1654ms935ms
Java 7 Arrays.sort951ms764ms
basicQuickSort1431ms2570ms
basicQuickSort with 'better' pivot1365ms2482ms
Three-way partitioning Quicksort1330ms829ms
Three-way partitioning Quicksort with Insertion sort1155ms818ms

After several seconds of very exciting research study it was found that Java 7 uses new version of Quicksort algorithm which was discovered just in 2009 by Vladimir Yaroslavskiy and named Dual-Pivot QuickSort. Interestingly that after some search in internet, I have found algorithm called "Multiple pivot sorting" which was published in 2007. It seems like generic case of "Dual-Pivot QuickSort" where is possible to have any number of pivots.
As you may notice from the name, the main difference of that algorithm is that it is using two pivots, instead of one. Coding now is getting even more complicated. The simplest version of that algorithm may look like this:

public static void dualPivotQuicksort(long arr[], int beginIdx, int len) {
if ( len < 2 )
return;

final int endIdx = beginIdx+len-1;

long pivot1 = arr[beginIdx];
long pivot2 = arr[endIdx];

if ( pivot1 == pivot2 ) {
final long lengths = QuickSort.threeWayPartitioning(arr, beginIdx, endIdx, pivot1);
final int lLen = (int)(lengths>>32);
final int rLen = (int)lengths;

dualPivotQuicksort3(arr, beginIdx, lLen);
dualPivotQuicksort3(arr, endIdx-rLen+1, rLen);
} else {
if ( pivot1 > pivot2 ) {
long tmp = pivot1;
pivot1 = pivot2;
pivot2 = tmp;
Utils.swap(arr, beginIdx, endIdx);
}

int l = beginIdx;
int r = endIdx;
int p = beginIdx;

while ( p <= r ) {
if ( arr[p] < pivot1 ) {
Utils.swap(arr, l++, p++);
} else if ( arr[p] > pivot2 ) {
while ( arr[r] > pivot2 && r > p ) {
--r;
}
Utils.swap(arr, r--, p);
} else {
++p;
}
}
if ( arr[l] == pivot1 ) ++l;
if ( arr[r] == pivot2 ) --r;

dualPivotQuicksort3(arr, beginIdx, l-beginIdx);
dualPivotQuicksort3(arr, l, r-l+1);
dualPivotQuicksort3(arr, r+1, endIdx-r);
}
}

First code picks up two pivots. If pivots are the same, it means we have just one pivot and in that case we can used three-way method for partitioning. If pivots are different, then partitioning process will look like this:

Scanning pointer "p" is moving from the beginning of array. If current element is "<> pivot1", then r-th element is swapped with p-th and "r" pointer is moved to next element backwards. All stops when "p" becomes less than "r". After partitioning, array will look like this:

When partition is finished, algorithms is called recursively for each partition.

Reader shouldn't expect good performance from the provided code, it is not fast and performs even worse than Java 6 Arrays.sort. I was provided just to illustrate the concept.

To be honest I failed to make my implementation to perform any better than version from Java 7. I must admit, that Yaroslav made a very good job there. Therefore I do not think that there is any sense in discussing my implementation here in details.

But, if someone wants to challenge Java 7 version I can point to some direction for optimizations. Firstly, which is seems obvious? is pivot selection. Another easy improvement is Insertions sort at the beginning. Also, I have noticed that that algorithm is very sensitive to inlining, so there is sense to inline Utils.swap(). As other option, you can decided to go thought the middle partition and move elements equals to pivot1 or pivot2 to their final positions which will exclide them from the further sorting. I found that it is effective for relatively (<=512 elements) small arrays. You can also have a look at source from the Java 7 and try to implement some tricks from there. Be ready to spend a lot of time :) All in all, it can be seen that over the years sorting is getting better and better. And that statement doesn't only relate to Quicksort. Other sorting algorithms are improving as well. As examples can be considered Introsoft or Timsort. However, it would be true to say that nothing really new was discovered in that area since 1960s-1980s. Hopefully we will be lucky enough to see something completely new and radical in the future.

For ones who want to dig deeper, as the starting point, I would suggest to visit following links:

3 comments:

assylias said...

It would be interesting to give more details about how you measure the time taken by each method (in particular whether you adjust for external factors such as garbage collection and JIT compilation).

assylias said...

ps: I should have started by saying that it was an interesting post ;-)

Stas said...

hello, all tests are done without GC running and without JITing (there was a warm-up), so should be pretty accurate. Do not take absolute value of these mesuments as something meaningful, the only useful information is relative performance.
Performance of quicksort in java7, actually, is really noticable. I would event bet that it will beat C++ implementation from standard library (Introsort). BTW, would be interesting if someone really does that comparison.