***Updated 08-23-2012 01:04:38***

**Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.**

A Haskell version of the previous code using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression). -- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1)) -- For M union operations on a set of N elements. -- O((M+N)lg*N) worst-case find time (practically O(1)) -- For M connected(find) operations on a set of N elements. data DisjointSet = DisjointSet { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) } deriving (Read, Show) -- Return id of root object findRoot :: DisjointSet -> Int -> Int findRoot set p | p == parent = p | otherwise = findRoot set parent where parent = index (ids set) (p - 1) -- Are objects P and Q connected ? connected :: DisjointSet -> Int -> Int -> Bool connected set p q = (findRoot set p) == (findRoot set q) -- Replace sets containing P and Q with their union quickUnion :: DisjointSet -> Int -> Int -> DisjointSet quickUnion set p q | i == j = set | otherwise = DisjointSet cnt rids rsizes where (i, j) = (findRoot set p, findRoot set q) (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1)) (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1) -- Always make smaller root point to the larger one (rids, rsizes) = if psmaller then (update (i - 1) j (ids set), update (j - 1) size (sizes set)) else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10 creating union find with 10 objects ...DONE DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]} All objects are disconnected. 1 and 9 connected ? False 4 and 6 connected ? False 3 and 1 connected ? False 7 and 8 connected ? False creating unions ...DONE DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]} All objects are connected (only 1 group). 1 and 9 connected ? True 4 and 6 connected ? True 3 and 1 connected ? True 7 and 8 connected ? True real 0m0.002s user 0m0.000s sys 0m0.000s

I don’t understand how that solves the complexity problem of your previous Shen implementation: the update to the array is done with ((ids set) // [(i-1, j)]), which if I understand correctly still means that the whole vector of ids will be copied before update (the documentation for (//) warns that it is linear in the size of the input vector: http://hackage.haskell.org/packages/archive/vector/0.7.0.1/doc/html/Data-Vector.html#v:-47–47- ).

You’re right. I forgot to switch the data type before posting this. I’ve changed the code to use a more efficient(access and update) persistent type (Data.Sequence) for the ids and sizes. Updates are performed in O(logN) worst case time.

On my machine a test with 10^6 nodes now runs in ~1.5 seconds.

Pingback: Purely Functional Data Structures & Algorithms : Selection Sort | imagine27