# Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

• Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
• Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
```\**\
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
Count : number ; Ids : (vector number) ; Sizes : (vector number);
=================================================================
[Count Ids Sizes] : disjoint-set;)```
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
```\* Create a new disjoint-set type *\
(define new
{ number --> disjoint-set }
N -> [N (range 1 N) (vector-init 1 N)])```
```\* Return the number of disjoint sets *\
(define count
{ disjoint-set --> number }
[Count Ids Sizes] -> Count)```
```\* Return id of root object *\
(define find-root
{ disjoint-set --> number --> number }
[Count Ids Sizes] P -> (let Parent
\* Path Compression *\
(<-vector Ids (<-vector Ids P))
(if (= P Parent)
P
(find-root [Count Ids Sizes] Parent)))```
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
```\* Are objects P and Q in the set ? *\
(define connected
{ disjoint-set --> number --> number --> boolean }
UF P Q -> (= (find-root UF P) (find-root UF Q)))```
```\* Replace sets containing P and Q with their union *\
(define quick-union
{ disjoint-set --> number --> number --> disjoint-set }
[Count Ids Sizes] P Q
-> (let UF [Count Ids Sizes]
I (find-root UF P)
J (find-root UF Q)
SizeI (<-vector Sizes I)
SizeJ (<-vector Sizes J)
SizeSum (+ SizeI SizeJ)
CIds (vector-copy Ids)
CSizes (vector-copy Sizes)
(if (= I J)
[Count CIds CSizes]
\* Always make smaller root point to larger one *\
(do (if (< SizeI SizeJ)
(do (vector-> CIds I J) (vector-> CSizes J SizeSum))
(do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
[(- Count 1) CIds CSizes]))))```
After running our test we get the following output.
```(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ...
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number```
All the code can be found here.

## 7 thoughts on “Purely Functional Data Structures & Algorithms : Union-Find”

1. Frank Shearar on said:

As far as I’m aware there’s no _purely_ functional union-find. There is, however, an _apparently_ functional union-find, courtesy of Conchon and Filliâtre (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.8494). They use Henry Baker’s shallow binding trick (specifically his “rerooting” trick) to build their union-find on top of a persistent array/map. Even though behind the scenes there’s “massive use of side effects” (to quote the authors), the structure looks purely functional from the user’s perspective.

I translated this into Smalltalk, writing up my experiences: http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk

• jng on said:

I’m doing effectively the same with this code, hence the vector-copy stuff. Yes, it’s only ‘purely functional’ in the sense that it’s a non-destructive function.

2. Eugene Kirpichov on said:

Indeed, although this is purely functional, this has completely different complexity, as basically the entire structure is copied on every union. This is very far from the inverse Ackermann function complexity :(

• jng on said:

For M finds over a set of N elements, the worst-case order of growth is still O((M+N)lg*N) or O(5(M+N)).
You’re right, for M unions over a set of N elements, the worst-case order of growth is O(5(M+N) + 2MN) because of the vector copying.

For now, this is only due to a limitation of Shen’s persistent data types(list in this example) not being efficient(access and update). Porting the code to Haskell, ML or Clojure would achieve the same performance.

• jng on said:

This site uses Akismet to reduce spam. Learn how your comment data is processed.