Inย computer science, aย Selection sortย is aย sorting algorithm, specifically anย in-placeย comparison sort. It hasย O(n^{2}) time complexity, making it inefficient on large lists, and generally performs worse than the similarย insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

(A functional implementation of selection sort is however, not an in-place sort.)
Behold the abomination which is the imperative implementation (from theย Wikipediaย link) :

int i,j;
int iMin;
for (j = 0; j < n-1; j++) {
iMin = j;
for ( i = j+1; i < n; i++) {
if (a[i] < a[iMin]) {
iMin = i;
}
}
if ( iMin != j ) {
swap(a[j], a[iMin]);
}
}

Now, the functional equivalent in Haskell :

selectionSort :: [Int] -> [Int] -> [Int]
selectionSort sorted [] = reverse sorted
selectionSort sorted unsorted = selectionSort (min:sorted) (delete min unsorted)
where min = minimum unsorted

Yes. These functional snippets use their respective implementations of the list type (which is not an efficient persistent data type in either Haskell or Shen for accesses or updates). Replacing the List type with Data.Sequence(a persistent data type with efficient constant access and update) for the Haskell snippet is trivial. I’ll leave that as an exercise to the reader. Shen is too new to support theseย efficientย persistent types at the moment but implementations will appear in the future and changing the snippet would also be trivial. A Clojure implementation using it’s already built in efficient persistent types would also be trivial.

Inย computing, aย disjoint-set data structureย is aย data structureย that keeps track of aย setย of elementsย partitionedย into a number ofย disjointย (nonoverlapping) subsets. Aย union-find algorithmย is an algorithm that performs two useful operations on such a data structure:

Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.

Union: Join two subsets into a single subset.

My inspiration comes from Sedgewick and Wayne’s class over at Coursera :ย Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.

I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.

As before, we are using Shen as our implementation language.

First we define our disjoint-set type.

\**\
\* Disjoint set data type (weighted and using path compression) demonstrating *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
Count : number ; Ids : (vector number) ; Sizes : (vector number);
=================================================================
[Count Ids Sizes] : disjoint-set;)

Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.

\* Create a new disjoint-set type *\
(define new
{ number --> disjoint-set }
N -> [N (range 1 N) (vector-init 1 N)])

\* Return the number of disjoint sets *\
(define count
{ disjoint-set --> number }
[Count Ids Sizes] -> Count)

\* Return id of root object *\
(define find-root
{ disjoint-set --> number --> number }
[Count Ids Sizes] P -> (let Parent
\* Path Compression *\
(<-vector Ids (<-vector Ids P))
(if (= P Parent)
P
(find-root [Count Ids Sizes] Parent)))

Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.

\* Are objects P and Q in the set ? *\
(define connected
{ disjoint-set --> number --> number --> boolean }
UF P Q -> (= (find-root UF P) (find-root UF Q)))

\* Replace sets containing P and Q with their union *\
(define quick-union
{ disjoint-set --> number --> number --> disjoint-set }
[Count Ids Sizes] P Q
-> (let UF [Count Ids Sizes]
I (find-root UF P)
J (find-root UF Q)
SizeI (<-vector Sizes I)
SizeJ (<-vector Sizes J)
SizeSum (+ SizeI SizeJ)
CIds (vector-copy Ids)
CSizes (vector-copy Sizes)
(if (= I J)
[Count CIds CSizes]
\* Always make smaller root point to larger one *\
(do (if (< SizeI SizeJ)
(do (vector-> CIds I J) (vector-> CSizes J SizeSum))
(do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
[(- Count 1) CIds CSizes]))))

After running our test we get the following output.

(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ...
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true
run time: 0.0 secs
1 : number

A Shen type-checked implementation of Quick Sort is even more elegant/terse compared with the CL version posted previously.
Pattern-matching and currying make this possible.

(tc +)
(define filter
{(A --> boolean) --> (list A) --> (list A)}
_ [] -> []
T? [A | B] -> (append [A] (filter T? B)) where (T? A)
T? [_ | B] -> (filter T? B))
(define quick-sort-generic
{(list A) --> (A --> A --> boolean) --> (A --> A --> boolean) --> (list A)}
[] _ _ -> []
[A | B] L? R? -> (append (quick-sort-generic (filter (R? A) B) L? R?)
[A]
(quick-sort-generic (filter (L? A) B) L? R?)))
\* descending with duplicates *\
* (quick-sort-generic [3 1 2 7 9 6 6 3 0] >= <)
* [9 7 6 6 3 3 2 1 0] : (list number)

The first publicly available version of Shen/Kl has been released.

The Shen mission is to develop an ultra-portable version of Qi that can run under a wide variety of platforms and which incorporates missing features in Qi such as streams. The targeted platforms are CL, Clojure, NewLisp, Emacs Lisp, ECL, Scheme, Javascript and Python as well as C/C++, DVM, CLR or the JVM. This approach involved rewriting and redesigning Qi to fit within the smallest feasible instruction set.

Kl is intentionally a small ‘Lisp assembly language’ that enables Qi to be built on top of it.

In this second post in this series we look at an implementation of the always useful Fast Fourier Transform.

(FFT) An algorithm for computing the Fourier transform of a set of discrete data values. Given a finite set of data points, for example a periodic sampling taken from a real-world signal, the FFT expresses the data in terms of its component frequencies. It also solves the essentially identical inverse problem of reconstructing a signal from the frequency data.

The FFT is a mainstay of numerical analysis. Gilbert Strang described it as “the most important algorithm of our generation”. The FFT also provides the asymptotically fastest known algorithm for multiplying two polynomials.

Our implementation comes in at just under 100 lines of code

This is the first in a series of posts that will demonstrate the implementation of many well-known(and less known) data structures and algorithms using a purely functional approach.
We will use Qi as our implementation language for a number of reasons :

It’s a Lisp : macros, EVAL, hash-tables, property-lists, meta-programming etc.

Pattern matching.

Optional static type checking.

A Turing-complete type system !

In this first post we look at an implementation of the well-known Red-Black tree abstract data type in Qi.

Our implementation comes in at 57 lines of code (with the balance function at only 7 lines)

(tc +)
(datatype tree-node
Key : number; Val : B;
======================
[Key Val] : tree-node;)
(datatype color
if (element? Color [red black])
_______________________________
Color : color;)
(datatype tree
if (empty? Tree)
________________
Tree : tree;
Color : color; LTree : tree; TreeNode : tree-node; RTree : tree;
================================================================
[Color LTree TreeNode RTree] : tree;)
(define node-key
{tree-node --> number}
[Key Val] -> Key)
(define make-tree-black
{tree --> tree}
[Color A X B] -> [black A X B])
(define member
{tree-node --> tree --> boolean}
X NIL -> false
X [Color A Y B] -> (if (< (node-key X) (node-key Y))
(member X A)
(if (< (node-key Y) (node-key X))
(member X B)
true)))
(define balance
{tree --> tree}
[black [red [red A X B] Y C] Z D] -> [red [black A X B] Y [black C Z D]]
[black [red A X [red B Y C]] Z D] -> [red [black A X B] Y [black C Z D]]
[black A X [red [red B Y C] Z D]] -> [red [black A X B] Y [black C Z D]]
[black A X [red B Y [red C Z D]]] -> [red [black A X B] Y [black C Z D]]
S -> S)
(define insert-
{tree-node --> tree --> tree}
X [] -> [red [] X []]
X [Color A Y B] -> (if (< (node-key X) (node-key Y))
(balance [Color (insert- X A) Y B])
(if (< (node-key Y) (node-key X))
(balance [Color A Y (insert- X B)])
[Color A Y B])))
(define insert
{tree-node --> tree --> tree}
X S -> (make-tree-black (insert- X S)))

This is a reasonably performant implementation (we haven’t even tried to optimize it yet).

(19-) (run-tests NIL)
tree: [black
[red [black [red [] [1 1] []] [2 2] [red [] [5 5] []]] [7 7]
[black [red [] [8 8] []] [11 11] []]]
[14 14] [black [] [15 15] []]]
12 is a member ? false
8 is a member ? true
Creating tree with 100000 elements ...
Evaluation took:
0.578 seconds of real time
0.562833 seconds of total run time (0.491572 user, 0.071261 system)
[ Run times consist of 0.160 seconds GC time, and 0.403 seconds non-GC time. ]
97.40% CPU
1,210,617,335 processor cycles
168,551,696 bytes consed
Performing lookups in tree with 100000 elements ...
666 in tree ? true
Evaluation took:
0.000 seconds of real time
0.000044 seconds of total run time (0.000035 user, 0.000009 system)
0.00% CPU
86,110 processor cycles
0 bytes consed
-1 in tree ?
Evaluation took:
0.000 seconds of real time
0.000024 seconds of total run time (0.000021 user, 0.000003 system)
100.00% CPU
46,368 processor cycles
0 bytes consed

A comparable implementation in Java/C++ will usually run a few hundred lines of code.
All Qi code in this post is here.

Qi is the future of Lisp.
It is Lisp with many great features such as pattern-matching, a turing complete static type system (even more powerful than Haskell’s type system) and many others.

So in the spirit of PI day, here’s an implementation that calculates PI using Machin’s formula.

(define ceiling
X -> (CEILING X))
(declare ceiling [number --> number])
(define expt
X Y -> (EXPT X Y))
(declare expt [number --> number --> number])
(define floor
X Y -> (FLOOR X Y))
(declare floor [number --> number --> number])
(define log
X Y -> (LOG X Y))
(declare log [number --> number --> number])
(tc +)
(define arccot-
{number --> number --> number --> number --> number --> number}
X N XPOWER 0 _ -> 0
X N XPOWER TERM 1 -> (+ (arccot- X (+ N 2) (floor XPOWER X)
(floor XPOWER N) 0) (floor XPOWER N))
X N XPOWER TERM 0 -> (- (arccot- X (+ N 2) (floor XPOWER X)
(floor XPOWER N) 1) (floor XPOWER N)))
(define arccot
{number --> number --> number}
X UNITY -> (let XPOWER (floor (/ UNITY X) 1)
(arccot- (* X X) 1 XPOWER (floor XPOWER 1) 1)))
(define machin-pi
{number --> number}
DIGITS -> (let GUARD (+ 10 (ceiling (log DIGITS 10)))
UNITY (expt 10 (+ DIGITS GUARD))
(floor (* 4 (- (* 4 (arccot 5 UNITY))
(arccot 239 UNITY))) (expt 10 GUARD))))

And the output …

(time (machin-pi 10000))
Evaluation took:
0.379 seconds of real time
0.372112 seconds of total run time (0.269730 user, 0.102382 system)
[ Run times consist of 0.055 seconds GC time, and 0.318 seconds non-GC time. ]
98.15% CPU
903,769,308 processor cycles
78,698,160 bytes consed
314159265358979323846264338327950 ... 1655256375678 : number

This one had been sitting on my bookshelf for almost a year.

“Let Over Lambda is one of the most hardcore computer programming books out there. Starting with the fundamentals, it describes the most advanced features of the most advanced language: COMMON LISP. The point of this book is to expose you to ideas that you might otherwise never be exposed to.”

and

“Macros are what make lisp the greatest programming language in the world.”

and

“If you are looking for a dry coding manual that re-hashes common-sense techniques in whatever langue du jour, this book is not for you. This book is about pushing the boundaries of what we know about programming.”

“Only the top percentile of programmers use Lisp and if you can understand this book you are in the top percentile of Lisp programmers.”

I’m sure that the author has received his fair share of derision for making statements like these and for his clear and well researched content showing how Lisp is the greatest programming language ever invented. It may come off as conceited to some, but he is right. He’s also in good company with the greatest computer scientists ever known who have also made similarly bold statements regarding Lisp. This style is lacking in most technical writing today, there has been too much coddling and dilution in programming texts over the last decade. The fact is that unless you have mastered this language then you are in no position to even begin disussing this topic. This book, much like any of the posts on this site, is not for convincing the majority of Blub programmers that they should be using Lisp but to encourage a few that have an appreciation for the world’s greatest programming language to look even deeper.

Here’s a quote from the first chapter for those that might have known for a long while that there is more to programming than the hype of formal object systems that we’ve all been subjected to for the past few decades :

“Object systems are a formalisation of a subset of let and lambda combinations, sometimes with gimmicks like inheritance bolted on. Because of this, Lisp programmers often don’t think in terms of classes and objects. Let and lambda are fundamental; objects and classes are derivatives. As Steele says, the ‘object’ need not be a primitive notion in programming languages. Once assignable value cells and good old lambda expressions are available, object systems are, at best, occasionally useful abstractions and, at worst, special-case and redundant.”

Buy the book. Programming language choice matters.

“Language comes so naturally to us that it’s easy to believe there’s some sort of intrinsic logic connecting the thing and its name, the signifier and the signified. In one of Plato’s dialogues, a character named Cratylus argues that “a power more than human gave things their first names.”

But Cratylus was wrong. Human language is an emanation of the human mind. A thing doesn’t care what we call it. Words and their rules don’t tell us about the world; they tell us about ourselves.

That’s the simple premise behind Steven Pinker’s latest work of popular science. According to the Harvard psychologist, people are “verbivores, a species that lives on words.” If you want to understand how the brain works, how it thinks about space and causation and time, how it processes emotions and engages in social interactions, then you need to plunge “down the rabbit hole” of language. The quirks of our sentences are merely a portal to the mind.

In The Stuff of Thought, Pinker pitches himself as the broker of a scientific compromise between “linguistic determinism” and “extreme nativism.” The linguistic determinists argue that language is a prison for thought. The words we know define our knowledge of the world. Because Eskimos have more nouns for snow, they are able to perceive distinctions in snow that English speakers cannot. While Pinker deftly discredits extreme versions of this hypothesis, he admits that “boring versions” of linguistic determinism are probably accurate. It shouldn’t be too surprising that our choice of words can frame events, or that our vocabulary reflects the kinds of things we encounter in our daily life. (Why do Eskimos have so many words for snow? Because they are always surrounded by snow.) The language we learn as children might not determine our thoughts, but it certainly influences them.

Extreme nativism, on the other hand, argues that all of our mental concepts — the 50,000 or so words in the typical vocabulary — are innate. We are born knowing about carburetors and doorknobs and iPods. This bizarre theory, most closely identified with the philosopher Jerry Fodor, begins with the assumption that the meaning of words cannot be dissected into more basic parts. A doorknob is a doorknob is a doorknob. It only takes Pinker a few pages to prove the obvious, which is that each word is not an indivisible unit. The mind isn’t a blank slate, but it isn’t an overstuffed filing cabinet either.

So what is Pinker’s solution? He advocates the middle ground of “conceptual semantics,” in which the meaning of our words depends on an underlying framework of basic cognitive concepts. (As Pinker admits, he owes a big debt to Kant.) The tenses of verbs, for example, are shaped by our innate sense of time. Nouns are constrained by our intuitive notions about matter, so that we naturally parcel things into two different categories, objects and substances (pebbles versus applesauce, for example, or, as Pinker puts it, “hunks and goo”). Each material category comes with a slightly different set of grammatical rules. By looking at language from the perspective of our thoughts, Pinker demonstrates that many seemingly arbitrary aspects of speech, like that hunk and goo distinction, aren’t arbitrary at all: They are byproducts of our evolved mental machinery.

Pinker tries hard to make this tour of linguistic theory as readable as possible. He uses the f-word to explore the topic of transitive and intransitive verbs. He clarifies indirect speech by examining a scene from “Tootsie,” and Lenny Bruce makes so many appearances that he should be granted a posthumous linguistic degree. But profanity from Lenny Bruce can’t always compensate for the cryptic vocabulary and long list of competing ‘isms. Sometimes, the payoff can be disappointing. After a long chapter on curse words — this book deserves an “explicit content” warning — Pinker ends with the banal conclusion that swearing is “connected with negative emotion.” I don’t need conceptual semantics to tell me that.

The Stuff of Thought concludes with an optimistic gloss on the power of language to lead us out of the Platonic cave, so that we can “transcend our cognitive and emotional limitations.” It’s a nice try at a happy ending, but I don’t buy it. The Stuff of Thought, after all, is really about the limits of language, the way our prose and poetry are bound by innate constraints we can’t even comprehend. Flaubert was right: “Language is a cracked kettle on which we beat out tunes for bears to dance to, while all the time we long to move the stars to pity.”