***Updated 08-23-2012 01:04:38***

**Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.**

A Haskell version of the previous codeΒ using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
{ count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
deriving (Read, Show)
-- Return id of root object
findRoot :: DisjointSet -> Int -> Int
findRoot set p | p == parent = p
| otherwise = findRoot set parent
where
parent = index (ids set) (p - 1)
-- Are objects P and Q connected ?
connected :: DisjointSet -> Int -> Int -> Bool
connected set p q = (findRoot set p) == (findRoot set q)
-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -> Int -> Int -> DisjointSet
quickUnion set p q | i == j = set
| otherwise = DisjointSet cnt rids rsizes
where
(i, j) = (findRoot set p, findRoot set q)
(i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
(cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1)
-- Always make smaller root point to the larger one
(rids, rsizes) = if psmaller
then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10
creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False
creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True
real 0m0.002s
user 0m0.000s
sys 0m0.000s

Complete code

### Like this:

Like Loading...

I don’t understand how that solves the complexity problem of your previous Shen implementation: the update to the array is done with ((ids set) // [(i-1, j)]), which if I understand correctly still means that the whole vector of ids will be copied before update (the documentation for (//) warns that it is linear in the size of the input vector: http://hackage.haskell.org/packages/archive/vector/0.7.0.1/doc/html/Data-Vector.html#v:-47–47- ).

Youβre right. I forgot to switch the data type before posting this. Iβve changed the code to use a more efficient(access and update) persistent type (Data.Sequence) for the ids and sizes. Updates are performed in O(logN) worst case time.

On my machine a test with 10^6 nodes now runs in ~1.5 seconds.

Pingback: Purely Functional Data Structures & Algorithms : Selection Sort | imagine27