Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
\**\
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 =================================================================
 [Count Ids Sizes] : disjoint-set;)
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
\* Create a new disjoint-set type *\
(define new
 { number --> disjoint-set }
 N -> [N (range 1 N) (vector-init 1 N)])
\* Return the number of disjoint sets *\
(define count
 { disjoint-set --> number }
 [Count Ids Sizes] -> Count)
\* Return id of root object *\
(define find-root
 { disjoint-set --> number --> number }
 [Count Ids Sizes] P -> (let Parent 
                         \* Path Compression *\
                         (<-vector Ids (<-vector Ids P))
                         (if (= P Parent)
                             P
                             (find-root [Count Ids Sizes] Parent)))
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --> number --> number --> boolean }
 UF P Q -> (= (find-root UF P) (find-root UF Q)))
\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --> number --> number --> disjoint-set }
 [Count Ids Sizes] P Q 
 -> (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (<-vector Sizes I)
         SizeJ (<-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (< SizeI SizeJ)
                  (do (vector-> CIds I J) (vector-> CSizes J SizeSum))
                  (do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))
After running our test we get the following output.
(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ... 
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number
All the code can be found here.

7 thoughts on “Purely Functional Data Structures & Algorithms : Union-Find

  1. As far as I’m aware there’s no _purely_ functional union-find. There is, however, an _apparently_ functional union-find, courtesy of Conchon and Filliâtre (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.8494). They use Henry Baker’s shallow binding trick (specifically his “rerooting” trick) to build their union-find on top of a persistent array/map. Even though behind the scenes there’s “massive use of side effects” (to quote the authors), the structure looks purely functional from the user’s perspective.

    I translated this into Smalltalk, writing up my experiences: http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk

    • I’m doing effectively the same with this code, hence the vector-copy stuff. Yes, it’s only ‘purely functional’ in the sense that it’s a non-destructive function.

  2. Indeed, although this is purely functional, this has completely different complexity, as basically the entire structure is copied on every union. This is very far from the inverse Ackermann function complexity :(

  3. Pingback: Purely Functional Data Structures & Algorithms : Union-Find (Haskell) | imagine27

  4. Pingback: Purely Functional Data Structures & Algorithms : Selection Sort | imagine27

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>