Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

*Updated 08-23-2012 01:04:38*
Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.

A Haskell version of the previous code using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
     { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
     deriving (Read,  Show)

-- Return id of root object
findRoot :: DisjointSet -> Int -> Int
findRoot set p | p == parent = p
               | otherwise   = findRoot set parent
               where
                parent = index (ids set) (p - 1)

-- Are objects P and Q connected ?
connected :: DisjointSet -> Int -> Int -> Bool
connected set p q = (findRoot set p) == (findRoot set q)

-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -> Int -> Int -> DisjointSet
quickUnion set p q | i == j = set
                   | otherwise = DisjointSet cnt rids rsizes
                     where
                        (i, j)   = (findRoot set p, findRoot set q)
                        (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
                        (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1)
                        -- Always make smaller root point to the larger one
                        (rids, rsizes) = if psmaller
                                         then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
                                         else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10

creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False

creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Complete code

3 thoughts on “Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

  1. I don’t understand how that solves the complexity problem of your previous Shen implementation: the update to the array is done with ((ids set) // [(i-1, j)]), which if I understand correctly still means that the whole vector of ids will be copied before update (the documentation for (//) warns that it is linear in the size of the input vector: http://hackage.haskell.org/packages/archive/vector/0.7.0.1/doc/html/Data-Vector.html#v:-47–47- ).

    • You’re right. I forgot to switch the data type before posting this. I’ve changed the code to use a more efficient(access and update) persistent type (Data.Sequence) for the ids and sizes. Updates are performed in O(logN) worst case time.
      On my machine a test with 10^6 nodes now runs in ~1.5 seconds.

  2. Pingback: Purely Functional Data Structures & Algorithms : Selection Sort | imagine27

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>