Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

*Updated 08-23-2012 01:04:38*
Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.

A Haskell version of the previous codeย using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
     { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
     deriving (Read,  Show)

-- Return id of root object
findRoot :: DisjointSet -> Int -> Int
findRoot set p | p == parent = p
               | otherwise   = findRoot set parent
               where
                parent = index (ids set) (p - 1)

-- Are objects P and Q connected ?
connected :: DisjointSet -> Int -> Int -> Bool
connected set p q = (findRoot set p) == (findRoot set q)

-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -> Int -> Int -> DisjointSet
quickUnion set p q | i == j = set
                   | otherwise = DisjointSet cnt rids rsizes
                     where
                        (i, j)   = (findRoot set p, findRoot set q)
                        (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
                        (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1)
                        -- Always make smaller root point to the larger one
                        (rids, rsizes) = if psmaller
                                         then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
                                         else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10

creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False

creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Complete code

Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forestsย and the complementary algorithm : union-find.

Inย computing, aย disjoint-set data structureย is aย data structureย that keeps track of aย setย of elementsย partitionedย into a number ofย disjointย (nonoverlapping) subsets. Aย union-find algorithmย is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera :ย Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
\**\
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 =================================================================
 [Count Ids Sizes] : disjoint-set;)
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
\* Create a new disjoint-set type *\
(define new
 { number --> disjoint-set }
 N -> [N (range 1 N) (vector-init 1 N)])
\* Return the number of disjoint sets *\
(define count
 { disjoint-set --> number }
 [Count Ids Sizes] -> Count)
\* Return id of root object *\
(define find-root
 { disjoint-set --> number --> number }
 [Count Ids Sizes] P -> (let Parent 
                         \* Path Compression *\
                         (<-vector Ids (<-vector Ids P))
                         (if (= P Parent)
                             P
                             (find-root [Count Ids Sizes] Parent)))
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --> number --> number --> boolean }
 UF P Q -> (= (find-root UF P) (find-root UF Q)))
\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --> number --> number --> disjoint-set }
 [Count Ids Sizes] P Q 
 -> (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (<-vector Sizes I)
         SizeJ (<-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (< SizeI SizeJ)
                  (do (vector-> CIds I J) (vector-> CSizes J SizeSum))
                  (do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))
After running our test we get the following output.
(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ... 
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number
All the code can be found here.

Alan Kay on Programming today (and a few other things)

From a recent Dr. Dobbs interview :

On adults –

Binstock:ย So you called them on the lying.

Kay:ย Yeah. But the thing that traumatized me occurred a couple years later, when I found an old copy ofย Lifeย magazine that had the Margaret Bourke-White photos from Buchenwald. This was in the 1940s โ€” no TV, living on a farm. That’s when I realized that adults were dangerous. Like, really dangerous.

On Computing as Pop Culture –

The lack of interest, the disdain for history is what makes computing not-quite-a-field.

I think the same is true of most people who write code for money. They have no idea where [their culture came from] โ€” and the Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.

On Programming –

The most disastrous thing about programming โ€” to pick one of the 10 most disastrous things about programming โ€” there’s a very popular movement based on pattern languages. Whenย Christopher Alexanderย first did that in architecture, he was looking at 2,000 years of ways that humans have made themselves comfortable. So there was actually something to it, because he was dealing with a genome that hasn’t changed that much. I think he got a few hundred valuable patterns out of it. But the bug in trying to do that in computing is the assumption that we know anything at all about programming. So extracting patterns from today’s programming practices ennobles them in a way they don’t deserve. It actually gives them more cachet.

The best teacher I had in graduate school spent the whole semester destroying any beliefs we had about computing. He was a real iconoclast. He happened to be a genius, so we took it. At the end of the course, we were free because we didn’t believe in anything. We had to learn everything, but then he destroyed it. He wanted us to understand what had been done, but he didn’t want us to believe in it.

Spring 2012 books

 

For the New Intellectual: The Philosophy of Ayn Rand (50th Anniversary Edition) – Ayn Rand
This is Ayn Rand’s challenge to the prevalent philosophical doctrines of our time and the “atmosphere of guilt, of panic, of despair, of boredom, and of all-pervasive evasion” that they create. One of the most controversial figures on the intellectual scene, Ayn Rand was the proponent of a moral philosophy–and ethic of rational self-interest–that stands in sharp opposition to the ethics of altruism and self-sacrifice. The fundamentals of this morality–“a philosophy for living on Earth”–are here vibrantly set forth by the spokesman for a new class, For the New Intellectual.


Empire of Illusion: The End of Literacy and the Triumph of Spectacleย – Chris Hedges
In this New York Times bestseller, Pulitzer Prizewinning journalist Chris Hedges has written a shattering meditation on American obsession with celebrity and the epidemic of illiteracy that threatens our cultural integrity. Reporting on such phenomena as professional wrestling, the pornographic film industry, and unchecked casino capitalism, Hedges exposes the mechanisms used to divert us from confronting the economic, political, and moral collapse around us. Empire of Illusion shows us how illiteracy and the embrace of fantasy have impoverished our working class, allowed for the continuance of destructive public policy, and ushered in cultural bankruptcy.

 

The Religion Virus: Why we believe in God: An evolutionist explains religion’s incredible hold on humanity – Craig A. James
Why is religion so incredibly tenacious? Why do intelligent people believe the universe is only six thousand years old? How can so many people believe the Bible, written over two thousand years ago, is 100% accurate in every respect?
Using the powerful new science of cultural evolution called “memetics” — how ideas spread and mutate as they move across society and down through history — Craig James takes us on a fascinating tour of religion’s peculiar and convoluted history.
Religions evolve, not metaphorically, but in a very real way. By applying “survival of the fittest” principles to religions, James shows shows us how religion became incredibly infectious to the average human.ย 

 

Adrenaline Junkies and Template Zombies: Understanding Patterns of Project Behavior – Suzanne Robertson (Author), James S. Robertson (Author), Stephen M. McMenamin (Author), Timothy R. Lister (Author), Peter Hruschka (Author), Tom DeMarco (Author)ย 
Adrenaline junkies, dead fish, project sluts, true believers, Lewis and Clark, template zombies . . .
Most developers, testers, and managers on IT projects are pretty good at recognizing patterns of behavior and gut-level hunches, as in, I sense that this project is headed for disaster.
But it has always been more difficult to transform these patterns and hunches into a usable form, something a team can debate, refine, and use. Until now.
In Adrenaline Junkies and Template Zombies, the six principal consultants of The Atlantic Systems Guild present the patterns of behavior they most often observe at the dozens of IT firms they transform each year, around the world.

 

In The Plex – Steven Levy
Few companies in history have ever been as successful and as admired as Google, the company that has transformed the Internet and become an indispensable part of our lives. How has Google done it? Veteran technology reporter Steven Levy was granted unprecedented access to the company, and in this revelatory book he takes readers inside Google headquartersโ€”the Googleplexโ€”to show how Google works.

Cognitive Robotics And Artificial Intelligence

Eccerobot

Eccerobot

At theย swissnex San Franciscoย conference earlier this year,ย scientists from Switzerland and the US discussed their research on humanoid robots, cognitive robotics, and artificial intelligence (AI). Talk revolved around how some robots self-reflect, self-improve, and adapt to new circumstances, and whether itโ€™s possible for robots of the future to possess the same cognitive characteristics as humans.

more …

Welcome to John McCarthy’s new website.

From the website: John was a legendary computer scientist at Stanford University who developed time-sharing, invented LISP, and founded the field of Artificial Intelligence.* In March 2011 John launched Project JMC with the objective to make his work more approachable and accessible. The Project JMC team is continuing to help realize his objective. In this site you will find all John’s work, including his social commentary, and acknowledgements of his outstanding contributions and impact. Additional comments, suggestions, stories, photographs and videos on John and his work are very welcome. Please send them to the Project JMC team. His old website is here.

Quick Sort in Shen

A Shen type-checked implementation of Quick Sort is even more elegant/terse compared with the CL version posted previously.
Pattern-matching and currying make this possible.

(tc +)

(define filter
  {(A --> boolean) --> (list A) --> (list A)}
  _  []      -> []
  T? [A | B] -> (append [A] (filter T? B)) where (T? A)
  T? [_ | B] -> (filter T? B))

(define quick-sort-generic
  {(list A) --> (A --> A --> boolean) --> (A --> A --> boolean) --> (list A)}
  [] _ _ -> []
  [A | B] L? R? -> (append (quick-sort-generic (filter (R? A) B) L? R?)
			   [A]
			   (quick-sort-generic (filter (L? A) B) L? R?)))

\* descending with duplicates *\
* (quick-sort-generic [3 1 2 7 9 6 6 3 0] >= <)
* [9 7 6 6 3 3 2 1 0] : (list number)

The complete code can be found here. Based on this numeric version.

Quick Sort in Common Lisp

After watching some of Tim Roughgarden’s videos on sorting algorithms, I thought I’d post an implementation of quick sort in Common Lisp as an example of a sorting algorithm implemented in CL. It’s a simple enough example(at < 20 LOC) that demonstrates one non-imperative approach to algorithm implementation. The complete code can be found here.

(defun quick-sort-generic2 (sequence cfun &optional result-type)
  (if (<= (length sequence) 1)
      (copy-seq sequence)
      (flet ((partition (fun array)
           (list (remove-if-not fun array) (remove-if fun array))))
    (let* ((result-type (or result-type 'vector))
           (pivot-ind (random (length sequence)))
           (pivot-val (elt sequence pivot-ind))
           (rem-seq
        (remove pivot-val sequence :start pivot-ind :end (+ 1 pivot-ind)))
           (part (partition (lambda (x) 
                  (apply cfun (list x pivot-val))) rem-seq)))
      (concatenate result-type
               (quick-sort-generic2 (car part) cfun result-type) 
               (list pivot-val)
               (quick-sort-generic2 (cadr part) cfun result-type))))))

* (test-sort)

started quick-sort (generic, array) ...
Evaluation took:
  0.089 seconds of real time
  0.081912 seconds of total run time (0.081587 user, 0.000325 system)
  92.13% CPU
  142,664,472 processor cycles
  8,375,024 bytes consed
  
quick-sorted 10000 items (first 10 shown) : 
#(9998 9998 9998 9997 9997 9996 9995 9994 9993 9992) 

started quick-sort (generic, list) ...
Evaluation took:
  0.062 seconds of real time
  0.058722 seconds of total run time (0.058417 user, 0.000305 system)
  95.16% CPU
  99,419,648 processor cycles
  9,371,456 bytes consed
  
quick-sorted 10000 items (first 10 shown) : 
(9999 9998 9997 9997 9996 9996 9994 9993 9993 9992) 

Happy Pi Day in Shen

Here’s a port of the previous Qi II code to Shen.
Run with Hakan Raberg’s 0.1.4 version of shen.clj (Shen implemented in Clojure !).

*
  Accurately calculates N digits of Pi using Machin's formula
  with fixed point arithmetic and variable guards digits. 

  Depends on the maths library -->
    http://www.shenlanguage.org/library.html
*

(tc +)

(define arccot-
  {number --> number --> number --> number --> number --> number} 
  X N XPOWER    0 _ -> 0
  X N XPOWER TERM 1 -> (+ (arccot- X (+ N 2) (floor (/ XPOWER X)) 
                                     (floor (/ XPOWER N)) 0) 
                          (floor (/ XPOWER N)))
  X N XPOWER TERM 0 -> (- (arccot- X (+ N 2) (floor (/ XPOWER X))
                                      (floor (/ XPOWER N)) 1) 
                          (floor (/ XPOWER N))))

(define arccot
  {number --> number --> number}
  X UNITY -> (let XPOWER (floor (/ UNITY X))
                  (arccot- (* X X) 1 XPOWER (floor XPOWER) 1)))

(define machin-pi
  {number --> number} 
  DIGITS -> (let GUARD (+ 10 (ceiling (log' DIGITS 10)))
                 UNITY (expt 10 (+ DIGITS GUARD))
                 (floor (/ (* 4 (- (* 4 (arccot 5 UNITY))
                                   (arccot 239 UNITY)))
                           (expt 10 GUARD)))))




(1+) (time (machin-pi 100))

run time: 2.56899999999996 secs
31415926535...4350265344N : number