It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

*In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:*

*Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.**Union: Join two subsets into a single subset.*

My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.

I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.

I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.

I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.

First we define our disjoint-set type.

\**\ \* Disjoint set data type (weighted and using path compression) demonstrating *\ \* 5(m + n) worst-case find time *\ \**\ (datatype disjoint-set Count : number ; Ids : (vector number) ; Sizes : (vector number); ================================================================= [Count Ids Sizes] : disjoint-set;)

Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.

\* Create a new disjoint-set type *\ (define new { number --> disjoint-set } N -> [N (range 1 N) (vector-init 1 N)])

\* Return the number of disjoint sets *\ (define count { disjoint-set --> number } [Count Ids Sizes] -> Count)

\* Return id of root object *\ (define find-root { disjoint-set --> number --> number } [Count Ids Sizes] P -> (let Parent \* Path Compression *\ (<-vector Ids (<-vector Ids P)) (if (= P Parent) P (find-root [Count Ids Sizes] Parent)))

Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.

\* Are objects P and Q in the set ? *\ (define connected { disjoint-set --> number --> number --> boolean } UF P Q -> (= (find-root UF P) (find-root UF Q)))

\* Replace sets containing P and Q with their union *\ (define quick-union { disjoint-set --> number --> number --> disjoint-set } [Count Ids Sizes] P Q -> (let UF [Count Ids Sizes] I (find-root UF P) J (find-root UF Q) SizeI (<-vector Sizes I) SizeJ (<-vector Sizes J) SizeSum (+ SizeI SizeJ) CIds (vector-copy Ids) CSizes (vector-copy Sizes) (if (= I J) [Count CIds CSizes] \* Always make smaller root point to larger one *\ (do (if (< SizeI SizeJ) (do (vector-> CIds I J) (vector-> CSizes J SizeSum)) (do (vector-> CIds J I) (vector-> CSizes I SizeSum))) [(- Count 1) CIds CSizes]))))

After running our test we get the following output.

(50+) (test 10) creating union find with 10 objects ... DONE [10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>] All objects are disconnected : 1 and 9 connected ? false 4 and 6 connected ? false 3 and 1 connected ? false 7 and 8 connected ? false ... creating unions ... DONE [1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>] All objects should be connected as there is only 1 group : 1 and 9 connected ? true 4 and 6 connected ? true 3 and 1 connected ? true 7 and 8 connected ? true run time: 0.0 secs 1 : number

All the code can be found here.

As far as I’m aware there’s no _purely_ functional union-find. There is, however, an _apparently_ functional union-find, courtesy of Conchon and Filliâtre (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.8494). They use Henry Baker’s shallow binding trick (specifically his “rerooting” trick) to build their union-find on top of a persistent array/map. Even though behind the scenes there’s “massive use of side effects” (to quote the authors), the structure looks purely functional from the user’s perspective.

I translated this into Smalltalk, writing up my experiences: http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk

I’m doing effectively the same with this code, hence the vector-copy stuff. Yes, it’s only ‘purely functional’ in the sense that it’s a non-destructive function.

Indeed, although this is purely functional, this has completely different complexity, as basically the entire structure is copied on every union. This is very far from the inverse Ackermann function complexity :(

For M finds over a set of N elements, the worst-case order of growth is still O((M+N)lg*N) or O(5(M+N)).

You’re right, for M unions over a set of N elements, the worst-case order of growth is O(5(M+N) + 2MN) because of the vector copying.

For now, this is only due to a limitation of Shen’s persistent data types(list in this example) not being efficient(access and update). Porting the code to Haskell, ML or Clojure would achieve the same performance.

In Haskell …

http://jng.imagine27.com/index.php/2012-08-22-144618_purely-functional-data-structures-algorithms-union-find-haskell.html

Pingback: Purely Functional Data Structures & Algorithms : Union-Find (Haskell) | imagine27

Pingback: Purely Functional Data Structures & Algorithms : Selection Sort | imagine27