Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

*Updated 08-23-2012 01:04:38*
Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.

A Haskell version of the previous code using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
     { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
     deriving (Read,  Show)

-- Return id of root object
findRoot :: DisjointSet -> Int -> Int
findRoot set p | p == parent = p
               | otherwise   = findRoot set parent
                parent = index (ids set) (p - 1)

-- Are objects P and Q connected ?
connected :: DisjointSet -> Int -> Int -> Bool
connected set p q = (findRoot set p) == (findRoot set q)

-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -> Int -> Int -> DisjointSet
quickUnion set p q | i == j = set
                   | otherwise = DisjointSet cnt rids rsizes
                        (i, j)   = (findRoot set p, findRoot set q)
                        (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
                        (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1)
                        -- Always make smaller root point to the larger one
                        (rids, rsizes) = if psmaller
                                         then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
                                         else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10

creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False

creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Complete code

Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 [Count Ids Sizes] : disjoint-set;)
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
\* Create a new disjoint-set type *\
(define new
 { number --> disjoint-set }
 N -> [N (range 1 N) (vector-init 1 N)])
\* Return the number of disjoint sets *\
(define count
 { disjoint-set --> number }
 [Count Ids Sizes] -> Count)
\* Return id of root object *\
(define find-root
 { disjoint-set --> number --> number }
 [Count Ids Sizes] P -> (let Parent 
                         \* Path Compression *\
                         (<-vector Ids (<-vector Ids P))
                         (if (= P Parent)
                             (find-root [Count Ids Sizes] Parent)))
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --> number --> number --> boolean }
 UF P Q -> (= (find-root UF P) (find-root UF Q)))
\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --> number --> number --> disjoint-set }
 [Count Ids Sizes] P Q 
 -> (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (<-vector Sizes I)
         SizeJ (<-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (< SizeI SizeJ)
                  (do (vector-> CIds I J) (vector-> CSizes J SizeSum))
                  (do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))
After running our test we get the following output.
(50+) (test 10)
creating union find with 10 objects ...
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ... 
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number
All the code can be found here.

Codebreaker – A new film about the life of Alan Turing

CODEBREAKER tells the story of one of the most important people of the 20th century.  Alan Turing set in motion the computer age and his World War II codebreaking helped save two million lives.  Yet few people have heard his name, know his tragic story, or understand his legacy.  In 1954, Turing committed suicide at age 41 after being forced to undergo hormone therapy to “fix” his sexual orientation.  He  left behind a lasting legacy and lingering questions about what else he might have accomplished if society had embraced his unique genius instead of rejecting it.  100 years after his birth, an international production team takes viewers on a journey to rediscover the man and the mystery.

Previously, previously and previously.

Bayes’s Theorem is more powerful than Jesus

Richard Carrier puts forward a fantastic approach to verifying history in his latest book :

Proving History: Bayes’s Theorem and the Quest for the Historical Jesus

“… historian Richard C. Carrier proposes Bayes’s theorem as a solution to the problem of establishing reliable historical criteria. He demonstrates that valid historical methods—not only in the study of Christian origins but in any historical study—can be described by, and reduced to, the logic of Bayes’s theorem. Conversely, he argues that any method that cannot be reduced to Bayes’s theorem is invalid and should be abandoned.”

And in case you’re wondering, yes it’s Bayes’s Theorem and not Bayes’ Theorem.

Alan Kay on Programming today (and a few other things)

From a recent Dr. Dobbs interview :

On adults –

Binstock: So you called them on the lying.

Kay: Yeah. But the thing that traumatized me occurred a couple years later, when I found an old copy of Life magazine that had the Margaret Bourke-White photos from Buchenwald. This was in the 1940s — no TV, living on a farm. That’s when I realized that adults were dangerous. Like, really dangerous.

On Computing as Pop Culture –

The lack of interest, the disdain for history is what makes computing not-quite-a-field.

I think the same is true of most people who write code for money. They have no idea where [their culture came from] — and the Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.

On Programming –

The most disastrous thing about programming — to pick one of the 10 most disastrous things about programming — there’s a very popular movement based on pattern languages. When Christopher Alexander first did that in architecture, he was looking at 2,000 years of ways that humans have made themselves comfortable. So there was actually something to it, because he was dealing with a genome that hasn’t changed that much. I think he got a few hundred valuable patterns out of it. But the bug in trying to do that in computing is the assumption that we know anything at all about programming. So extracting patterns from today’s programming practices ennobles them in a way they don’t deserve. It actually gives them more cachet.

The best teacher I had in graduate school spent the whole semester destroying any beliefs we had about computing. He was a real iconoclast. He happened to be a genius, so we took it. At the end of the course, we were free because we didn’t believe in anything. We had to learn everything, but then he destroyed it. He wanted us to understand what had been done, but he didn’t want us to believe in it.

Spring 2012 books


For the New Intellectual: The Philosophy of Ayn Rand (50th Anniversary Edition) – Ayn Rand
This is Ayn Rand’s challenge to the prevalent philosophical doctrines of our time and the “atmosphere of guilt, of panic, of despair, of boredom, and of all-pervasive evasion” that they create. One of the most controversial figures on the intellectual scene, Ayn Rand was the proponent of a moral philosophy–and ethic of rational self-interest–that stands in sharp opposition to the ethics of altruism and self-sacrifice. The fundamentals of this morality–“a philosophy for living on Earth”–are here vibrantly set forth by the spokesman for a new class, For the New Intellectual.

Empire of Illusion: The End of Literacy and the Triumph of Spectacle – Chris Hedges
In this New York Times bestseller, Pulitzer Prizewinning journalist Chris Hedges has written a shattering meditation on American obsession with celebrity and the epidemic of illiteracy that threatens our cultural integrity. Reporting on such phenomena as professional wrestling, the pornographic film industry, and unchecked casino capitalism, Hedges exposes the mechanisms used to divert us from confronting the economic, political, and moral collapse around us. Empire of Illusion shows us how illiteracy and the embrace of fantasy have impoverished our working class, allowed for the continuance of destructive public policy, and ushered in cultural bankruptcy.


The Religion Virus: Why we believe in God: An evolutionist explains religion’s incredible hold on humanity – Craig A. James
Why is religion so incredibly tenacious? Why do intelligent people believe the universe is only six thousand years old? How can so many people believe the Bible, written over two thousand years ago, is 100% accurate in every respect?
Using the powerful new science of cultural evolution called “memetics” — how ideas spread and mutate as they move across society and down through history — Craig James takes us on a fascinating tour of religion’s peculiar and convoluted history.
Religions evolve, not metaphorically, but in a very real way. By applying “survival of the fittest” principles to religions, James shows shows us how religion became incredibly infectious to the average human. 


Adrenaline Junkies and Template Zombies: Understanding Patterns of Project Behavior – Suzanne Robertson (Author), James S. Robertson (Author), Stephen M. McMenamin (Author), Timothy R. Lister (Author), Peter Hruschka (Author), Tom DeMarco (Author) 
Adrenaline junkies, dead fish, project sluts, true believers, Lewis and Clark, template zombies . . .
Most developers, testers, and managers on IT projects are pretty good at recognizing patterns of behavior and gut-level hunches, as in, I sense that this project is headed for disaster.
But it has always been more difficult to transform these patterns and hunches into a usable form, something a team can debate, refine, and use. Until now.
In Adrenaline Junkies and Template Zombies, the six principal consultants of The Atlantic Systems Guild present the patterns of behavior they most often observe at the dozens of IT firms they transform each year, around the world.


In The Plex – Steven Levy
Few companies in history have ever been as successful and as admired as Google, the company that has transformed the Internet and become an indispensable part of our lives. How has Google done it? Veteran technology reporter Steven Levy was granted unprecedented access to the company, and in this revelatory book he takes readers inside Google headquarters—the Googleplex—to show how Google works.

Computers for Cynics

Ted Nelson‘s latest video series called “Computers for Cynics” calls it like it is. How refreshing in contrast with the pervasive garbage called ‘information’ that we get fed in the second decade of the 21st century.

I laughed and wept at part 1 : The Myth of Technology

Quite timely considering the recent round of UI design atrocities spread under the mask of ‘good design’.
As if a bunch of over-payed(Windows), over-stroked(Gnome), over-controlling(Mac) middle-managed programmers can do what Steve Jobs did.

What’s my UI ? Not for the faint of heart : mostly emacs in a terminal window and stumpwm (only really possible in Linux).

RelatedThe more idiot-proof the system, the more you will constrain behavior, forcing people to act like idiots, even when it’s against their better judgment …

Cognitive Robotics And Artificial Intelligence



At the swissnex San Francisco conference earlier this year, scientists from Switzerland and the US discussed their research on humanoid robots, cognitive robotics, and artificial intelligence (AI). Talk revolved around how some robots self-reflect, self-improve, and adapt to new circumstances, and whether it’s possible for robots of the future to possess the same cognitive characteristics as humans.

more …