You are on page 1of 21

Fast Sorting

COMP 103 #13


Menu

• Sorting
• Quick Sort
• Divide and Conquer Sorting Analysis

• Assignment 5 – Due Wednesday 13th


• Test (Reminder):
• There will be a 90 minute test
• 15th December 15:00-17:00
• Practice: past tests and exams available.
COMP103 2006/T3 2
Divide and Conquer Sorts
To Sort: Array
• Split
• Sort each part Split
(recursive)
• Combine
SubArray SubArray
Split Split

Where does the work SubArray SubArray SubArray

Sort
SubArray

Sort
Sort
Sort Sort
Sort
happen? SortedSubArraySortedSubArray
Combine
SortedSubArraySortedSubArray
Combine

• MergeSort: SortedSubArray SortedSubArray


• split trivial
• combine does all the work Combine
• QuickSort:
• split does all the work Sorted Array
• combine trivial

COMP103 2006/T3 3
Quicksort Introduction
• Divide-and-conquer = fast algorithms.
• Mergesort always splits in the middle.
• What if we split more intelligently?
• Choose a value (pivot).
• Split the list into items less than the pivot
and greater than the pivot.
• recursively sort each part.
• This operation is called Partition. It can be
done in linear time.
• Now No Merge is necessary.
• This algorithm is called Quicksort (1961).

COMP103 2006/T3 4
Quicksort Pseudocode

quicksort(data)
if data have at least 2 elements
find the pivot
quicksort(data less than the pivot)
quicksort(data more than the
pivot)

COMP103 2006/T3 5
Quicksort: Partition

• The partition is critical – how do we do it?


• Stable?
• Doing a stable Partition is easy with an external buffer.
• However, this makes the algorithm slower than Merge
Sort.
• The best known in-place partition algorithms are not
stable.
• In-Place?
• We can get an in-place sort if Partition is in-place.
• If we give up stability, we can do a fast, in-place
Partition.
• Thus, Quicksort is generally done in-place.
COMP103 2006/T3 6
Quicksort: Partition
• An In-Place Partition
8 12 4 1 9 7
Algorithm Pivot

• Pick the first* element as


8 12 4 1 9 7
‘pivot’
• Create a list of items ≤ the 8 4 12 1 9 7
pivot
• Start: the left list holds only the 8 4 1 12 9 7
pivot.
• Iterate through rest of the list. 8 4 1 7 9 12
• If an item is less than the pivot,
swap it with the item just past
the end of the left list, and 7 4 1 8 9 12
Pivot
move the left-list end mark one
to the right.
COMP103 2006/T3 7
QuickSort: Partition
private static <E> int partition(E[] data, int min, int max,
Comparator<E> comp){
E pivot = data[min]; // Place to improve efficiency later.
int scan = min+1;
int mark = scan;
while(scan <= max){   
      if (comp.compare(data[scan], pivot) < 0){
swap(data, scan, mark);
mark++;
}
scan++;
}
swap(data, min, mark-1);
return mark;
}
COMP103 2006/T3 8
QuickSort
public static <E> void quickSort(E[] data, int low, int high,
Comparator<E> comp){
 if (high-low < 2) // only one item to sort.
return;
else { // split into two parts, mid = index of boundary
      int mid = partition(data, low, high, comp);
      quickSort(data, low, mid-1, comp);
      quickSort(data, mid+1, high, comp);
 }
}

COMP103 2006/T3 9
Quicksort: Problem
• Quicksort has a big problem.
• Quicksort may not split its input into nearly equal-sized parts.
• The pivot might be chosen very poorly. In such cases,
Quicksort has linear recursion depth and does linear-time work
at each step.
• Result: Quicksort is O(n2). 
• And the worst case happens when the data are already
sorted!

• But Quicksort’s average-case time is very fast.


• This is O(n log n) and significantly faster than Merge Sort.

• So, Quicksort is usually very fast; thus, people want to use


it.
• There are lots of algorithms that try to choose a better pivot.
COMP103 2006/T3 10
• We’ll just look at a simple approach in COMP 103.
Improving the pivot
• Choose the pivot using median-of-three.
• Look at three items in the list: first, middle, last.
• Let the pivot be the one that is between the other two
(by <).
Simple Quicksort Median-of-3 Quicksort

Initial State: Initial State:


2 12 9 10 3 1 6 2 12 9 10 3 1 6
Pivot Pivot

After Partition: After Partition:


1 2 12 9 10 3 6 2 3 1 6 12 9 10

Recursively Sort Recursively Sort

• This gives acceptable performance on nearly sorted


data.
• But it is still O(n2).
COMP103 2006/T3 11
More tweaking Quicksort
• A Minor Speed-Up: Finish with Insertion Sort
• If we stop Quicksort from going to the bottom of its
recursion. We end up with a nearly sorted list.
• Finish sorting this list using one call to Insertion Sort.
• This is faster, but still O(n2).

Initial State: 2 12 9 10 3 1 6
Modified Stop the recursion
Quicksort when the sublist to
be sorted is small.
Nearly Sorted: 2 3 1 6 12 9 10

Insertion Sort

Sorted: 1 2 3 6 9 10 12

COMP103 2006/T3 12
Analyzing Recursion

• There’s no inner loop!


• How do you analyse recursive algorithms?

• Cost of mergeSort:
• first recursive call
• second recursive call
• merge
• Cost of Quick Sort:
• partition
• first recursive call
• second recursive call

COMP103 2006/T3 13
MergeSort Cost

COMP103 2006/T3 14
MergeSort Cost
Level 1: 2 * n/2 = n
Level 2: 4 * n/4 = n
Level 3: 8 * n/8 = n
Level 4: 16 * n/16 = n

Level k: n * 1 =n

How many levels? log (n)


Total cost? n log (n)
= O( )

n = 1,000:
n = 1,000,000:

COMP103 2006/T3 15
MergeSort Cost

mergesort(data)
if data have at least 2 elements
mergesort (left half of data)
mergesort (right half of data)
merge(both halves into sorted list)

• Cost of mergeSort = C(n)


• C(n) = C(n/2) + C(n/2) + n C(1) = 0
= 2 C(n/2) + n

• Recurrence Relation:
• Solve by repeated substitution & find pattern
• Solve by general method (MATH 214)
COMP103 2006/T3 16
Solving Recurrence Relations

C(n) = 2 C(n/2) + n
= 2 [ 2 C(n/4) + n/2] + n
= 4 C(n/4) +2*n
= 4 [ 2 (C(n/8) + n/4] + 2 * n
= 8 C(n/8) +3*n
= 16 C(n/16) + 4 * n
:
= 2k C( n/2k ) + k * n
when n = 2k, k = lg(n)
= n C (1) + lg(n) * n
since C(1) = 0
C(n) = lg(n) * n
COMP103 2006/T3 17
QuickSort Cost
• If Quicksort divides the array exactly in half, then:
• C(n) = n + 2 C(n/2)
= n lg(n) comparisons = O(n log(n))
(best case)

• If Quicksort divides the array into 1 and n-1:


• C(n) = n + (n-1) + (n-2) + (n-3) + … + 2 + 1
= n(n-1)/2 comparisons = O(n2)
(worst case)

• Average case?
• Very hard to analyse.
• Still O(n log(n)), and very good.
COMP103 2006/T3 18
Merge Sort — Analysis
• Efficiency 
• Merge Sort is O(n log n).
• Merge Sort also has an average-case time of O(n log n).
• Requirements on Data 
• Merge Sort does not require random-access data.
• However, we might write it differently for different kinds of data.
• Operations needed: array — copy.
• Space Usage /
• For an array, efficient Merge Sort uses a buffer: linear additional space.

• Recursive Merge Sort uses stack space: recursion depth is about log2n,
so O(log n), or logarithmic space.
• Iterative versions can be written without this memory requirement.
However, log2n is not much, so often we just don’t worry about it.
• Stability 
• Merge Sort is stable.
• Performance on Nearly Sorted Data 
• Merge Sort is no faster on nearly sorted data (but it is still log-linear
time).
COMP103 2006/T3 19
Quicksort Analysis
• Efficiency 
• Quicksort is O(n2).
• Quicksort has a very good O(n log n) average-case time. 
• Requirements on Data 
• Non-trivial pivot-selection algorithms (median-of-three and
others) are only efficient for random-access data.
• Space Usage 
• Quicksort can be done efficiently in-place.
• Quicksort uses space for recursion (or simulated recursion).
• Additional space: O(log n), if you are clever about it.
• This additional space need not hold any data items.
• Stability 
• Efficient versions of Quicksort are not stable.
• Performance on Nearly Sorted Data 
• A non-optimized Quicksort is slow on nearly sorted data:
O(n2).
• Median-of-three Quicksort is O(n log n) on nearly sorted data.
COMP103 2006/T3 20
When to Use …
• Bubble Sort, Selection Sort
• Never, ever, not ever.
• Insertion Sort
• For small lists.
• When you are guaranteed nearly sorted data.
• Merge Sort
• When stability is needed.
• For some kinds of non-random-access data, especially linked lists.
• Quicksort (simple)
• Never.
• Quicksort (with introspection – impose a limit on the recursive
depth)
• Most of the time.
• Radix Sort
• In various special cases, usually large lists of smallish numbers or
fixed-length strings.

COMP103 2006/T3 21

You might also like