Purely Functional Data Structures & Algorithms : Selection Sort

*Updated @ 2012-08-31 02:08:58 due to internet pedantry*

Previously, previously.

According to Wikipedia :

In computer science, a Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) time complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

(A functional implementation of selection sort is however, not an in-place sort.)
Behold the abomination which is the imperative implementation (from the Wikipedia link) :

int i,j;
int iMin;

for (j = 0; j < n-1; j++) {
    iMin = j;
    for ( i = j+1; i < n; i++) {
        if (a[i] < a[iMin]) {
            iMin = i;

    if ( iMin != j ) {
        swap(a[j], a[iMin]);

Now, the functional equivalent in Haskell :

selectionSort :: [Int] -> [Int] -> [Int]
selectionSort sorted [] = reverse sorted
selectionSort sorted unsorted = selectionSort (min:sorted) (delete min unsorted)
                     where min = minimum unsorted

Or in Shen :

(define selection-sort-aux
  { (list number) --> (list number) --> (list number) }
  Sorted []       -> (reverse Sorted)
  Sorted Unsorted -> (let Min (minimum Unsorted)
        (selection-sort-aux (cons Min Sorted) (remove-first Min Unsorted))))

Yes. These functional snippets use their respective implementations of the list type (which is not an efficient persistent data type in either Haskell or Shen for accesses or updates). Replacing the List type with Data.Sequence(a persistent data type with efficient constant access and update) for the Haskell snippet is trivial. I’ll leave that as an exercise to the reader. Shen is too new to support these efficient persistent types at the moment but implementations will appear in the future and changing the snippet would also be trivial. A Clojure implementation using it’s already built in efficient persistent types would also be trivial.

The complete code can be found here.

Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

*Updated 08-23-2012 01:04:38*
Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.

A Haskell version of the previous code using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
     { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
     deriving (Read,  Show)

-- Return id of root object
findRoot :: DisjointSet -> Int -> Int
findRoot set p | p == parent = p
               | otherwise   = findRoot set parent
                parent = index (ids set) (p - 1)

-- Are objects P and Q connected ?
connected :: DisjointSet -> Int -> Int -> Bool
connected set p q = (findRoot set p) == (findRoot set q)

-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -> Int -> Int -> DisjointSet
quickUnion set p q | i == j = set
                   | otherwise = DisjointSet cnt rids rsizes
                        (i, j)   = (findRoot set p, findRoot set q)
                        (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
                        (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1)
                        -- Always make smaller root point to the larger one
                        (rids, rsizes) = if psmaller
                                         then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
                                         else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10

creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False

creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Complete code

Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 [Count Ids Sizes] : disjoint-set;)
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
\* Create a new disjoint-set type *\
(define new
 { number --> disjoint-set }
 N -> [N (range 1 N) (vector-init 1 N)])
\* Return the number of disjoint sets *\
(define count
 { disjoint-set --> number }
 [Count Ids Sizes] -> Count)
\* Return id of root object *\
(define find-root
 { disjoint-set --> number --> number }
 [Count Ids Sizes] P -> (let Parent 
                         \* Path Compression *\
                         (<-vector Ids (<-vector Ids P))
                         (if (= P Parent)
                             (find-root [Count Ids Sizes] Parent)))
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --> number --> number --> boolean }
 UF P Q -> (= (find-root UF P) (find-root UF Q)))
\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --> number --> number --> disjoint-set }
 [Count Ids Sizes] P Q 
 -> (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (<-vector Sizes I)
         SizeJ (<-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (< SizeI SizeJ)
                  (do (vector-> CIds I J) (vector-> CSizes J SizeSum))
                  (do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))
After running our test we get the following output.
(50+) (test 10)
creating union find with 10 objects ...
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ... 
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number
All the code can be found here.

Codebreaker – A new film about the life of Alan Turing

CODEBREAKER tells the story of one of the most important people of the 20th century.  Alan Turing set in motion the computer age and his World War II codebreaking helped save two million lives.  Yet few people have heard his name, know his tragic story, or understand his legacy.  In 1954, Turing committed suicide at age 41 after being forced to undergo hormone therapy to “fix” his sexual orientation.  He  left behind a lasting legacy and lingering questions about what else he might have accomplished if society had embraced his unique genius instead of rejecting it.  100 years after his birth, an international production team takes viewers on a journey to rediscover the man and the mystery.

Previously, previously and previously.

Bayes’s Theorem is more powerful than Jesus

Richard Carrier puts forward a fantastic approach to verifying history in his latest book :

Proving History: Bayes’s Theorem and the Quest for the Historical Jesus

“… historian Richard C. Carrier proposes Bayes’s theorem as a solution to the problem of establishing reliable historical criteria. He demonstrates that valid historical methods—not only in the study of Christian origins but in any historical study—can be described by, and reduced to, the logic of Bayes’s theorem. Conversely, he argues that any method that cannot be reduced to Bayes’s theorem is invalid and should be abandoned.”

And in case you’re wondering, yes it’s Bayes’s Theorem and not Bayes’ Theorem.