# Purely Functional Data Structures & Algorithms : Selection Sort

*Updated @ 2012-08-31 02:08:58 due to internet pedantry*

According to Wikipedia :

In computer science, a Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) time complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

(A functional implementation of selection sort is however, not an in-place sort.)
Behold the abomination which is the imperative implementation (from the Wikipedia link) :

```int i,j;
int iMin;

for (j = 0; j < n-1; j++) {
iMin = j;
for ( i = j+1; i < n; i++) {
if (a[i] < a[iMin]) {
iMin = i;
}
}

if ( iMin != j ) {
swap(a[j], a[iMin]);
}
}```

Now, the functional equivalent in Haskell :

```selectionSort :: [Int] -> [Int] -> [Int]
selectionSort sorted [] = reverse sorted
selectionSort sorted unsorted = selectionSort (min:sorted) (delete min unsorted)
where min = minimum unsorted```

Or in Shen :

```(define selection-sort-aux
{ (list number) --> (list number) --> (list number) }
Sorted []       -> (reverse Sorted)
Sorted Unsorted -> (let Min (minimum Unsorted)
(selection-sort-aux (cons Min Sorted) (remove-first Min Unsorted))))```

Yes. These functional snippets use their respective implementations of the list type (which is not an efficient persistent data type in either Haskell or Shen for accesses or updates). Replacing the List type with Data.Sequence(a persistent data type with efficient constant access and update) for the Haskell snippet is trivial. I’ll leave that as an exercise to the reader. Shen is too new to support these efficient persistent types at the moment but implementations will appear in the future and changing the snippet would also be trivial. A Clojure implementation using it’s already built in efficient persistent types would also be trivial.

The complete code can be found here.

# Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

• Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
• Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
```\**\
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
Count : number ; Ids : (vector number) ; Sizes : (vector number);
=================================================================
[Count Ids Sizes] : disjoint-set;)```
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
```\* Create a new disjoint-set type *\
(define new
{ number --> disjoint-set }
N -> [N (range 1 N) (vector-init 1 N)])```
```\* Return the number of disjoint sets *\
(define count
{ disjoint-set --> number }
[Count Ids Sizes] -> Count)```
```\* Return id of root object *\
(define find-root
{ disjoint-set --> number --> number }
[Count Ids Sizes] P -> (let Parent
\* Path Compression *\
(<-vector Ids (<-vector Ids P))
(if (= P Parent)
P
(find-root [Count Ids Sizes] Parent)))```
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
```\* Are objects P and Q in the set ? *\
(define connected
{ disjoint-set --> number --> number --> boolean }
UF P Q -> (= (find-root UF P) (find-root UF Q)))```
```\* Replace sets containing P and Q with their union *\
(define quick-union
{ disjoint-set --> number --> number --> disjoint-set }
[Count Ids Sizes] P Q
-> (let UF [Count Ids Sizes]
I (find-root UF P)
J (find-root UF Q)
SizeI (<-vector Sizes I)
SizeJ (<-vector Sizes J)
SizeSum (+ SizeI SizeJ)
CIds (vector-copy Ids)
CSizes (vector-copy Sizes)
(if (= I J)
[Count CIds CSizes]
\* Always make smaller root point to larger one *\
(do (if (< SizeI SizeJ)
(do (vector-> CIds I J) (vector-> CSizes J SizeSum))
(do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
[(- Count 1) CIds CSizes]))))```
After running our test we get the following output.
```(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ...
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number```
All the code can be found here.

# Quick Sort in Shen

A Shen type-checked implementation of Quick Sort is even more elegant/terse compared with the CL version posted previously.
Pattern-matching and currying make this possible.

```(tc +)

(define filter
{(A --> boolean) --> (list A) --> (list A)}
_  []      -> []
T? [A | B] -> (append [A] (filter T? B)) where (T? A)
T? [_ | B] -> (filter T? B))

(define quick-sort-generic
{(list A) --> (A --> A --> boolean) --> (A --> A --> boolean) --> (list A)}
[] _ _ -> []
[A | B] L? R? -> (append (quick-sort-generic (filter (R? A) B) L? R?)
[A]
(quick-sort-generic (filter (L? A) B) L? R?)))

\* descending with duplicates *\
* (quick-sort-generic [3 1 2 7 9 6 6 3 0] >= <)
* [9 7 6 6 3 3 2 1 0] : (list number)```

# Happy Pi Day in Shen

Here’s a port of the previous Qi II code to Shen.
Run with Hakan Raberg’s 0.1.4 version of shen.clj (Shen implemented in Clojure !).

```*
Accurately calculates N digits of Pi using Machin's formula
with fixed point arithmetic and variable guards digits.

Depends on the maths library -->
http://www.shenlanguage.org/library.html
*

(tc +)

(define arccot-
{number --> number --> number --> number --> number --> number}
X N XPOWER    0 _ -> 0
X N XPOWER TERM 1 -> (+ (arccot- X (+ N 2) (floor (/ XPOWER X))
(floor (/ XPOWER N)) 0)
(floor (/ XPOWER N)))
X N XPOWER TERM 0 -> (- (arccot- X (+ N 2) (floor (/ XPOWER X))
(floor (/ XPOWER N)) 1)
(floor (/ XPOWER N))))

(define arccot
{number --> number --> number}
X UNITY -> (let XPOWER (floor (/ UNITY X))
(arccot- (* X X) 1 XPOWER (floor XPOWER) 1)))

(define machin-pi
{number --> number}
DIGITS -> (let GUARD (+ 10 (ceiling (log' DIGITS 10)))
UNITY (expt 10 (+ DIGITS GUARD))
(floor (/ (* 4 (- (* 4 (arccot 5 UNITY))
(arccot 239 UNITY)))
(expt 10 GUARD)))))

(1+) (time (machin-pi 100))

run time: 2.56899999999996 secs
31415926535...4350265344N : number
```

# Shen/Kl arrive

The first publicly available version of Shen/Kl has been released.

The Shen mission is to develop an ultra-portable version of Qi that can run under a wide variety of platforms and which incorporates missing features in Qi such as streams. The targeted platforms are CL, Clojure, NewLisp, Emacs Lisp, ECL, Scheme, Javascript and Python as well as C/C++, DVM, CLR or the JVM. This approach involved rewriting and redesigning Qi to fit within the smallest feasible instruction set.

Kl is intentionally a small ‘Lisp assembly language’ that enables Qi to be built on top of it.

more …

# Purely Functional Data Structures & Algorithms : Fast Fourier Transform in Qi In this second post in this series we look at an implementation of the always useful Fast Fourier Transform.

(FFT) An algorithm for computing the Fourier transform of a set of discrete data values. Given a finite set of data points, for example a periodic sampling taken from a real-world signal, the FFT expresses the data in terms of its component frequencies. It also solves the essentially identical inverse problem of reconstructing a signal from the frequency data.

The FFT is a mainstay of numerical analysis. Gilbert Strang described it as “the most important algorithm of our generation”. The FFT also provides the asymptotically fastest known algorithm for multiplying two polynomials.

Our implementation comes in at just under 100 lines of code

``` Math
(declare atan [number --> number])
(define atan X -> (ATAN X))

(declare cos [number --> number])
(define cos X -> (COS X))

(declare sin [number --> number])
(define sin X -> (SIN X))

(tc +)

Complex numbers

(datatype complex
Real : number; Imag : number;
=============================
[Real Imag] : complex;)

(define complex-mult
{complex --> complex --> complex}
[R1 I1] [R2 I2] -> [(- (* R1 R2) (* I1 I2))
(+ (* R1 I2) (* I1 R2))])

{complex --> complex --> complex}
[R1 I1] [R2 I2] -> [(+ R1 R2) (+ I1 I2)])

(define complex-diff
{complex --> complex --> complex}
[R1 I1] [R2 I2] -> [(- R1 R2) (- I1 I2)])

Fast Fourier Transform

(define butterfly-list
{((list complex) * ((list complex) * (list complex)))
--> ((list complex) * ((list complex) * (list complex)))}
(@p X (@p X1 X2)) -> (if (empty? X)
(@p X (@p (reverse X1) (reverse X2)))
(butterfly-list
(@p (tail (tail X))

(define calc-results
{(((list complex) * (list (list complex))) *
((list complex) * (list complex)))
--> (((list complex) * (list (list complex))) *
((list complex) * (list complex)))}
(@p (@p [W WN] [YA YB]) (@p Y1 Y2)) ->
(if (and (empty? Y1) (empty? Y2))
(@p (@p [W WN] [(reverse YA) (reverse YB)]) (@p Y1 Y2))
(calc-results
(@p (@p [(complex-mult W WN) WN]
(@p (tail Y1) (tail Y2))))))

(define fft
{number --> complex --> (list complex) --> (list complex)
--> (list complex)}
1 WN X Y -> [(head X)]
N WN X Y -> (let M   (round (/ N 2))
Inp (butterfly-list (@p X (@p [] [])))
X1  (fst (snd Inp))
X2  (snd (snd Inp))
Y1  (fft M (complex-mult WN WN) X1 [])
Y2  (fft M (complex-mult WN WN) X2 [])
W   [1 0]
Res (calc-results (@p (@p [W WN] [[] []]) (@p Y1 Y2)))

(define dotimes-fft
{number --> number --> complex --> (list complex) --> (list complex)
--> (list complex)}
Iterations Size W Input Res ->
(if ( number --> (list complex)
--> (list complex)}
Iterations Size Input -> (let Pi    (* 4 (atan 1))
Theta (* 2 (/ Pi Size))
W     [(cos Theta) (* -1 (sin Theta))]
(dotimes-fft Iterations Size W Input [])))```

Let’s give it a spin …

``` Square wave test

(26-) (time (run-fft 100000 16
[[0 0] [1 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 0]
[0 0] [1 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 0]]))

Evaluation took:
2.999 seconds of real time
2.942718 seconds of total run time (2.798716 user, 0.144002 system)
[ Run times consist of 0.371 seconds GC time, and 2.572 seconds non-GC time. ]
98.13% CPU
6,282,874,678 processor cycles
1,641,619,888 bytes consed

[[8 0] [0.0 0.0] [0.0 0.0] [0.0 0.0]
[0.0 0.0] [0.0 0.0] [0.0 0.0] [0.0 0.0]
[-8 0] [0.0 0.0] [0.0 0.0] [0.0 0.0]
[0.0 0.0] [0.0 0.0] [0.0 0.0] [0.0 0.0]] : (list complex)```

All Qi code in this post is here.

# Purely Functional Data Structures & Algorithms : Red-Black Trees in Qi

Update 2011/06/28 : Source has been modified to compile with Shen

This is the first in a series of posts that will demonstrate the implementation of many well-known(and less known) data structures and algorithms using a purely functional approach.
We will use Qi as our implementation language for a number of reasons :

It’s a Lisp : macros, EVAL, hash-tables, property-lists, meta-programming etc.
Pattern matching.
Optional static type checking.
A Turing-complete type system !

In this first post we look at an implementation of the well-known Red-Black tree abstract data type in Qi.

A red–black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named “symmetric binary B-tree,” but acquired its modern name in a paper in 1978 by Leonidas J. Guibas and Robert Sedgewick. It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is total number of elements in the tree. Put very simply, a red–black tree is a binary search tree that inserts and removes intelligently, to ensure the tree is reasonably balanced. Our implementation comes in at 57 lines of code (with the balance function at only 7 lines)

```(tc +)

(datatype tree-node
Key : number; Val : B;
======================
[Key Val] : tree-node;)

(datatype color
if (element? Color [red black])
_______________________________
Color : color;)

(datatype tree
if (empty? Tree)
________________
Tree : tree;

Color : color; LTree : tree; TreeNode : tree-node; RTree : tree;
================================================================
[Color LTree TreeNode RTree] : tree;)

(define node-key
{tree-node --> number}
[Key Val] -> Key)

(define make-tree-black
{tree --> tree}
[Color A X B] -> [black A X B])

(define member
{tree-node --> tree --> boolean}
X NIL -> false
X [Color A Y B] -> (if (< (node-key X) (node-key Y))
(member X A)
(if (< (node-key Y) (node-key X))
(member X B)
true)))

(define balance
{tree --> tree}
[black [red [red A X B] Y C] Z D] -> [red [black A X B] Y [black C Z D]]
[black [red A X [red B Y C]] Z D] -> [red [black A X B] Y [black C Z D]]
[black A X [red [red B Y C] Z D]] -> [red [black A X B] Y [black C Z D]]
[black A X [red B Y [red C Z D]]] -> [red [black A X B] Y [black C Z D]]
S -> S)

(define insert-
{tree-node --> tree --> tree}
X [] -> [red [] X []]
X [Color A Y B] -> (if (< (node-key X) (node-key Y))
(balance [Color (insert- X A) Y B])
(if (< (node-key Y) (node-key X))
(balance [Color A Y (insert- X B)])
[Color A Y B])))

(define insert
{tree-node --> tree --> tree}
X S -> (make-tree-black (insert- X S)))
```

This is a reasonably performant implementation (we haven’t even tried to optimize it yet).

```(19-) (run-tests NIL)
tree: [black
[red [black [red [] [1 1] []] [2 2] [red [] [5 5] []]] [7 7]
[black [red [] [8 8] []] [11 11] []]]
[14 14] [black [] [15 15] []]]
12 is a member ? false
8 is a member ? true

Creating tree with 100000 elements ...
Evaluation took:
0.578 seconds of real time
0.562833 seconds of total run time (0.491572 user, 0.071261 system)
[ Run times consist of 0.160 seconds GC time, and 0.403 seconds non-GC time. ]
97.40% CPU
1,210,617,335 processor cycles
168,551,696 bytes consed

Performing lookups in tree with 100000 elements ...
666 in tree ? true
Evaluation took:
0.000 seconds of real time
0.000044 seconds of total run time (0.000035 user, 0.000009 system)
0.00% CPU
86,110 processor cycles
0 bytes consed

-1 in tree ?
Evaluation took:
0.000 seconds of real time
0.000024 seconds of total run time (0.000021 user, 0.000003 system)
100.00% CPU
46,368 processor cycles
0 bytes consed
```

A comparable implementation in Java/C++ will usually run a few hundred lines of code.
All Qi code in this post is here.

# Happy PI day ! (in QiII)

Qi is the future of Lisp.
It is Lisp with many great features such as pattern-matching, a turing complete static type system (even more powerful than Haskell’s type system) and many others.

So in the spirit of PI day, here’s an implementation that calculates PI using Machin’s formula.

```(define ceiling
X -> (CEILING X))
(declare ceiling [number --> number])

(define expt
X Y -> (EXPT X Y))
(declare expt [number --> number --> number])

(define floor
X Y -> (FLOOR X Y))
(declare floor [number --> number --> number])

(define log
X Y -> (LOG X Y))
(declare log [number --> number --> number])

(tc +)

(define arccot-
{number --> number --> number --> number --> number --> number}
X N XPOWER    0 _ -> 0
X N XPOWER TERM 1 -> (+ (arccot- X (+ N 2) (floor XPOWER X)
(floor XPOWER N) 0) (floor XPOWER N))
X N XPOWER TERM 0 -> (- (arccot- X (+ N 2) (floor XPOWER X)
(floor XPOWER N) 1) (floor XPOWER N)))

(define arccot
{number --> number --> number}
X UNITY -> (let XPOWER (floor (/ UNITY X) 1)
(arccot- (* X X) 1 XPOWER (floor XPOWER 1) 1)))

(define machin-pi
{number --> number}
DIGITS -> (let GUARD (+ 10 (ceiling (log DIGITS 10)))
UNITY (expt 10 (+ DIGITS GUARD))
(floor (* 4 (- (* 4 (arccot 5 UNITY))
(arccot 239 UNITY))) (expt 10 GUARD))))
```

And the output …

```(time (machin-pi 10000))

Evaluation took:
0.379 seconds of real time
0.372112 seconds of total run time (0.269730 user, 0.102382 system)
[ Run times consist of 0.055 seconds GC time, and 0.318 seconds non-GC time. ]
98.15% CPU
903,769,308 processor cycles
78,698,160 bytes consed

314159265358979323846264338327950 ... 1655256375678 : number
```

Compared with Common Lisp, Haskell and Clojure. “Let over Lambda – 50 Years of Lisp” by Doug Hoyte

This one had been sitting on my bookshelf for almost a year.

“Let Over Lambda is one of the most hardcore computer programming books out there. Starting with the fundamentals, it describes the most advanced features of the most advanced language: COMMON LISP. The point of this book is to expose you to ideas that you might otherwise never be exposed to.”

and

“Macros are what make lisp the greatest programming language in the world.”

and

“If you are looking for a dry coding manual that re-hashes common-sense techniques in whatever langue du jour, this book is not for you. This book is about pushing the boundaries of what we know about programming.”

“Only the top percentile of programmers use Lisp and if you can understand this book you are in the top percentile of Lisp programmers.”

I’m sure that the author has received his fair share of derision for making statements like these and for his clear and well researched content showing how Lisp is the greatest programming language ever invented. It may come off as conceited to some, but he is right. He’s also in good company with the greatest computer scientists ever known who have also made similarly bold statements regarding Lisp. This style is lacking in most technical writing today, there has been too much coddling and dilution in programming texts over the last decade. The fact is that unless you have mastered this language then you are in no position to even begin disussing this topic. This book, much like any of the posts on this site, is not for convincing the majority of Blub programmers that they should be using Lisp but to encourage a few that have an appreciation for the world’s greatest programming language to look even deeper.

Here’s a quote from the first chapter for those that might have known for a long while that there is more to programming than the hype of formal object systems that we’ve all been subjected to for the past few decades :

“Object systems are a formalisation of a subset of let and lambda combinations, sometimes with gimmicks like inheritance bolted on. Because of this, Lisp programmers often don’t think in terms of classes and objects. Let and lambda are fundamental; objects and classes are derivatives. As Steele says, the ‘object’ need not be a primitive notion in programming languages. Once assignable value cells and good old lambda expressions are available, object systems are, at best, occasionally useful abstractions and, at worst, special-case and redundant.”

Buy the book. Programming language choice matters.

“Introduction to Metamathematics” by Stephen Cole Kleene

“The Stuff of Thought: Language as a Window into Human Nature” by Steven Pinker

From the Washington Post :

“Language comes so naturally to us that it’s easy to believe there’s some sort of intrinsic logic connecting the thing and its name, the signifier and the signified. In one of Plato’s dialogues, a character named Cratylus argues that “a power more than human gave things their first names.”

But Cratylus was wrong. Human language is an emanation of the human mind. A thing doesn’t care what we call it. Words and their rules don’t tell us about the world; they tell us about ourselves.

That’s the simple premise behind Steven Pinker’s latest work of popular science. According to the Harvard psychologist, people are “verbivores, a species that lives on words.” If you want to understand how the brain works, how it thinks about space and causation and time, how it processes emotions and engages in social interactions, then you need to plunge “down the rabbit hole” of language. The quirks of our sentences are merely a portal to the mind.

In The Stuff of Thought, Pinker pitches himself as the broker of a scientific compromise between “linguistic determinism” and “extreme nativism.” The linguistic determinists argue that language is a prison for thought. The words we know define our knowledge of the world. Because Eskimos have more nouns for snow, they are able to perceive distinctions in snow that English speakers cannot. While Pinker deftly discredits extreme versions of this hypothesis, he admits that “boring versions” of linguistic determinism are probably accurate. It shouldn’t be too surprising that our choice of words can frame events, or that our vocabulary reflects the kinds of things we encounter in our daily life. (Why do Eskimos have so many words for snow? Because they are always surrounded by snow.) The language we learn as children might not determine our thoughts, but it certainly influences them.

Extreme nativism, on the other hand, argues that all of our mental concepts — the 50,000 or so words in the typical vocabulary — are innate. We are born knowing about carburetors and doorknobs and iPods. This bizarre theory, most closely identified with the philosopher Jerry Fodor, begins with the assumption that the meaning of words cannot be dissected into more basic parts. A doorknob is a doorknob is a doorknob. It only takes Pinker a few pages to prove the obvious, which is that each word is not an indivisible unit. The mind isn’t a blank slate, but it isn’t an overstuffed filing cabinet either.

So what is Pinker’s solution? He advocates the middle ground of “conceptual semantics,” in which the meaning of our words depends on an underlying framework of basic cognitive concepts. (As Pinker admits, he owes a big debt to Kant.) The tenses of verbs, for example, are shaped by our innate sense of time. Nouns are constrained by our intuitive notions about matter, so that we naturally parcel things into two different categories, objects and substances (pebbles versus applesauce, for example, or, as Pinker puts it, “hunks and goo”). Each material category comes with a slightly different set of grammatical rules. By looking at language from the perspective of our thoughts, Pinker demonstrates that many seemingly arbitrary aspects of speech, like that hunk and goo distinction, aren’t arbitrary at all: They are byproducts of our evolved mental machinery.

Pinker tries hard to make this tour of linguistic theory as readable as possible. He uses the f-word to explore the topic of transitive and intransitive verbs. He clarifies indirect speech by examining a scene from “Tootsie,” and Lenny Bruce makes so many appearances that he should be granted a posthumous linguistic degree. But profanity from Lenny Bruce can’t always compensate for the cryptic vocabulary and long list of competing ‘isms. Sometimes, the payoff can be disappointing. After a long chapter on curse words — this book deserves an “explicit content” warning — Pinker ends with the banal conclusion that swearing is “connected with negative emotion.” I don’t need conceptual semantics to tell me that.

The Stuff of Thought concludes with an optimistic gloss on the power of language to lead us out of the Platonic cave, so that we can “transcend our cognitive and emotional limitations.” It’s a nice try at a happy ending, but I don’t buy it. The Stuff of Thought, after all, is really about the limits of language, the way our prose and poetry are bound by innate constraints we can’t even comprehend. Flaubert was right: “Language is a cracked kettle on which we beat out tunes for bears to dance to, while all the time we long to move the stars to pity.”