Ring probabilities in F#


A few months back I took a look at Elixir. More recently I’ve been exploring F# and I’m very pleased with the experience so far. Here is the ring probabilities algorithm implemented using F#. It’s unlikely that I will ever use Elixir again because having a powerful static type system provided by F# at my disposal is just too good.

let rec calcStateProbs (prob: float, i: int,
                        currProbs: float [], newProbs: float []) = 
  if i < 0 then
    let maxIndex = currProbs.Length-1
    // Match prev, next probs based on the fact that this is a
    // ring structure.
    let (prevProb, nextProb) =
      match i with
        | i when i = maxIndex -> (currProbs.[i-1], currProbs.[0])
        | 0 -> (currProbs.[maxIndex], currProbs.[i+1])
        | _ -> (currProbs.[i-1], currProbs.[i+1])
    let newProb = prob * prevProb + (1.0 - prob) * nextProb
    Array.set newProbs i newProb
    calcStateProbs(prob, i-1, currProbs, newProbs)

let calcRingProbs parsedArgs =
  // Probs at S = 0.
  //   Make certain that we are positioned at only start location.
  //     e.g. P(Start Node) = 1
  let startProbs =
    Array.concat [ [| 1.0 |] ; [| for _ in 1 .. parsedArgs.nodes - 1 -> 0.0 |] ] 
  let endProbs =
    List.fold (fun probs _ ->
               calcStateProbs(parsedArgs.probability, probs.Length-1,
                              probs, Array.create probs.Length 0.0))
              startProbs [1..parsedArgs.states]

Here’s the code.
No promises this time but I may follow this sequential version up with a parallelized version.

O-notation considered harmful (use Analytic Combinatorics instead)

From Robert Sedgwick's lecture "Analytic Combinatorics, Part I - A Scientific Approach".https://class.coursera.org/introACpartI-001/lecture/index

From Robert Sedgewick’s lecture “Analytic Combinatorics, Part I – A Scientific Approach”.

For so long I’ve been skeptical about the classic approach of the “Theory of Algorithms” and its misuse and misunderstanding by many software engineers and programmers. Big O notation, the Big Theta Θ and Big Omega Ω notations are often not useful for comparing the performance of algorithms in practice. They are often not even useful for classifying algorithms. They are useful for determining theoretical limits of an algorithms’ performance. In other words, their theoretical lower bound, upper bound or both.
I’ve had to painfully and carefully argue this point a few times as an interviewee and many times as part of a team of engineers. In the first case it can mean the difference between impressing the interviewer or missing out on a great career opportunity due simply to ignorance and/or incorrigibility of the person interviewing you. In the latter it could mean wasted months or even years in implementation effort and/or a financial failure in the worst case.

In practice the O-notation approach to algorithmic analysis can often be quite misleading. Quick Sort vs. Merge Sort is a great example. Quick Sort is classified as time quadratic O(n²) and Merge Sort as time log-linear O(n log n) according to O-notation. In practice however, Quick Sort often performs twice as fast as Merge Sort and is also far more space efficient. As many folks know this has to do with the typical inputs of these algorithms in practice. Most engineers I know would still argue that Merge Sort is a better solution and apparently Robert has had the same argumentative response even though he is an expert in the field. In the lecture he kindly says the following : “… Such people usually don’t program much and shouldn’t be recommending what practitioners do”.

There are many more numerous examples of where practical application does not align with the use of O-notation. Also, detailed analysis of algorithmic performance just takes too long to be useful in practice most of the time. So what other options do we have ?

There is a better way. An emerging science called “Analytic Combinatorics” pioneered by Robert Sedgewick and the late Philippe Flajolet over the past 30 years with the first (and only) text appearing in 2009 called Analytic Combinatorics. This approach is based on the scientific method and provides an accurate and more efficient way to determine the performance of algorithms(and classify them correctly). It even makes it possible to reason about an algorithms’ performance based on real-world input. It also allows for the generation of random data for a particular structure or structures, among other benefits.

For an introduction by the same authors there is An Introduction to the Analysis of Algorithms(or the free PDF version)and Sedgewick’s video course. Just to make it clear how important this new approach is going to be to computer science (and other sciences), here’s what another CS pioneer has to say :

“[Sedgewick and Flajolet] are not only worldwide leaders of the field, they also are masters of exposition. I am sure that every serious computer scientist will find this book rewarding in many ways.” —From the Foreword by Donald E. Knuth


Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

*Updated 08-23-2012 01:04:38*
Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.

A Haskell version of the previous code using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression).
-- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1))
-- For M union operations on a set of N elements.
-- O((M+N)lg*N) worst-case find time (practically O(1))
-- For M connected(find) operations on a set of N elements.
data DisjointSet = DisjointSet
     { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) }
     deriving (Read,  Show)

-- Return id of root object
findRoot :: DisjointSet -&gt; Int -&gt; Int
findRoot set p | p == parent = p
               | otherwise   = findRoot set parent
                parent = index (ids set) (p - 1)

-- Are objects P and Q connected ?
connected :: DisjointSet -&gt; Int -&gt; Int -&gt; Bool
connected set p q = (findRoot set p) == (findRoot set q)

-- Replace sets containing P and Q with their union
quickUnion :: DisjointSet -&gt; Int -&gt; Int -&gt; DisjointSet
quickUnion set p q | i == j = set
                   | otherwise = DisjointSet cnt rids rsizes
                        (i, j)   = (findRoot set p, findRoot set q)
                        (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1))
                        (cnt, psmaller, size) = (count set - 1, i1 &lt; j1, i1 + j1)
                        -- Always make smaller root point to the larger one
                        (rids, rsizes) = if psmaller
                                         then (update (i - 1) j (ids set), update (j - 1) size (sizes set))
                                         else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10

creating union find with 10 objects ...DONE
DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]}
All objects are disconnected.
1 and 9 connected ? False
4 and 6 connected ? False
3 and 1 connected ? False
7 and 8 connected ? False

creating unions ...DONE
DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]}
All objects are connected (only 1 group).
1 and 9 connected ? True
4 and 6 connected ? True
3 and 1 connected ? True
7 and 8 connected ? True

real	0m0.002s
user	0m0.000s
sys	0m0.000s

Complete code

Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.

In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.
I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.
As before, we are using Shen as our implementation language.
First we define our disjoint-set type.
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 [Count Ids Sizes] : disjoint-set;)
Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.
\* Create a new disjoint-set type *\
(define new
 { number --&gt; disjoint-set }
 N -&gt; [N (range 1 N) (vector-init 1 N)])
\* Return the number of disjoint sets *\
(define count
 { disjoint-set --&gt; number }
 [Count Ids Sizes] -&gt; Count)
\* Return id of root object *\
(define find-root
 { disjoint-set --&gt; number --&gt; number }
 [Count Ids Sizes] P -&gt; (let Parent 
                         \* Path Compression *\
                         (&lt;-vector Ids (&lt;-vector Ids P))
                         (if (= P Parent)
                             (find-root [Count Ids Sizes] Parent)))
Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.
\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --&gt; number --&gt; number --&gt; boolean }
 UF P Q -&gt; (= (find-root UF P) (find-root UF Q)))
\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --&gt; number --&gt; number --&gt; disjoint-set }
 [Count Ids Sizes] P Q 
 -&gt; (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (&lt;-vector Sizes I)
         SizeJ (&lt;-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (&lt; SizeI SizeJ)
                  (do (vector-&gt; CIds I J) (vector-&gt; CSizes J SizeSum))
                  (do (vector-&gt; CIds J I) (vector-&gt; CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))
After running our test we get the following output.
(50+) (test 10)
creating union find with 10 objects ...
[10 &lt;1 2 3 4 5 6 7 8 9 10&gt; &lt;1 1 1 1 1 1 1 1 1 1&gt;]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ... 
[1 &lt;4 8 7 7 8 8 8 8 8 8&gt; &lt;1 1 1 2 1 1 4 10 1 1&gt;]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true

run time: 0.0 secs
1 : number
All the code can be found here.

Purely Functional Data Structures & Algorithms : Fast Fourier Transform in Qi

In this second post in this series we look at an implementation of the always useful Fast Fourier Transform.

(FFT) An algorithm for computing the Fourier transform of a set of discrete data values. Given a finite set of data points, for example a periodic sampling taken from a real-world signal, the FFT expresses the data in terms of its component frequencies. It also solves the essentially identical inverse problem of reconstructing a signal from the frequency data.

The FFT is a mainstay of numerical analysis. Gilbert Strang described it as “the most important algorithm of our generation”. The FFT also provides the asymptotically fastest known algorithm for multiplying two polynomials.

Our implementation comes in at just under 100 lines of code

(declare atan [number --&gt; number])
(define atan X -&gt; (ATAN X))

(declare cos [number --&gt; number])
(define cos X -&gt; (COS X))

(declare sin [number --&gt; number])
(define sin X -&gt; (SIN X))

(tc +)

 Complex numbers 

(datatype complex
    Real : number; Imag : number;
    [Real Imag] : complex;)

(define complex-mult
  {complex --&gt; complex --&gt; complex}
  [R1 I1] [R2 I2] -&gt; [(- (* R1 R2) (* I1 I2))
                      (+ (* R1 I2) (* I1 R2))])

(define complex-add
  {complex --&gt; complex --&gt; complex}
  [R1 I1] [R2 I2] -&gt; [(+ R1 R2) (+ I1 I2)])

(define complex-diff
  {complex --&gt; complex --&gt; complex}
  [R1 I1] [R2 I2] -&gt; [(- R1 R2) (- I1 I2)])

 Fast Fourier Transform 

(define butterfly-list
    {((list complex) * ((list complex) * (list complex)))
     --&gt; ((list complex) * ((list complex) * (list complex)))}
    (@p X (@p X1 X2)) -&gt; (if (empty? X)
                             (@p X (@p (reverse X1) (reverse X2)))
                              (@p (tail (tail X))
                                  (@p (cons (head X) X1)
                                      (cons (head (tail X)) X2))))))

(define calc-results
    {(((list complex) * (list (list complex))) * 
                        ((list complex) * (list complex)))
     --&gt; (((list complex) * (list (list complex))) * 
                            ((list complex) * (list complex)))}
    (@p (@p [W WN] [YA YB]) (@p Y1 Y2)) -&gt;
    (if (and (empty? Y1) (empty? Y2))
        (@p (@p [W WN] [(reverse YA) (reverse YB)]) (@p Y1 Y2))
         (@p (@p [(complex-mult W WN) WN]
                 [(cons (complex-add  (head Y1) (complex-mult W (head Y2))) YA)
                 (cons (complex-diff (head Y1) (complex-mult W (head Y2))) YB)])
             (@p (tail Y1) (tail Y2))))))

(define fft
    {number --&gt; complex --&gt; (list complex) --&gt; (list complex)
     --&gt; (list complex)}
    1 WN X Y -&gt; [(head X)]
    2 WN X Y -&gt; [(complex-add  (head X) (head (tail X)))
                 (complex-diff (head X) (head (tail X)))]
    N WN X Y -&gt; (let M   (round (/ N 2))
                     Inp (butterfly-list (@p X (@p [] [])))
                     X1  (fst (snd Inp))
                     X2  (snd (snd Inp))
                     Y1  (fft M (complex-mult WN WN) X1 [])
                     Y2  (fft M (complex-mult WN WN) X2 [])
                     W   [1 0]
                     Res (calc-results (@p (@p [W WN] [[] []]) (@p Y1 Y2)))
                     (append (head (snd (fst Res)))
                             (head (tail (snd (fst Res)))))))

(define dotimes-fft
    {number --&gt; number --&gt; complex --&gt; (list complex) --&gt; (list complex)
    --&gt; (list complex)}
    Iterations Size W Input Res -&gt;
    (if ( number --&gt; (list complex) 
     --&gt; (list complex)}
    Iterations Size Input -&gt; (let Pi    (* 4 (atan 1))
                                  Theta (* 2 (/ Pi Size))
                                  W     [(cos Theta) (* -1 (sin Theta))]
                                  (dotimes-fft Iterations Size W Input [])))

Let’s give it a spin …

 Square wave test 

(26-) (time (run-fft 100000 16 
             [[0 0] [1 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 0]
              [0 0] [1 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 0]]))

Evaluation took:
  2.999 seconds of real time
  2.942718 seconds of total run time (2.798716 user, 0.144002 system)
  [ Run times consist of 0.371 seconds GC time, and 2.572 seconds non-GC time. ]
  98.13% CPU
  6,282,874,678 processor cycles
  1,641,619,888 bytes consed

[[8 0] [0.0 0.0] [0.0 0.0] [0.0 0.0] 
 [0.0 0.0] [0.0 0.0] [0.0 0.0] [0.0 0.0]
 [-8 0] [0.0 0.0] [0.0 0.0] [0.0 0.0] 
 [0.0 0.0] [0.0 0.0] [0.0 0.0] [0.0 0.0]] : (list complex)

All Qi code in this post is here.

Chaitin Proving Darwin

White paper : To a mathematical theory of evolution and biological creativity

We present an information-theoretic analysis of Darwin’s theory of
evolution, modeled as a hill-climbing algorithm on a fitness landscape.
Our space of possible organisms consists of computer programs, which
are subjected to random mutations. We study the random walk of in-creasing
fitness made by a single mutating organism. In two different
models we are able to show that evolution will occur and to characterize
the rate of evolutionary progress, i.e., the rate of biological creativity

For many years we have been disturbed by the fact that there is no fundamental
mathematical theory inspired by Darwin’s theory of evolution.
This is the fourth paper in a series attempting to create
such a theory.

In a previous paper we did not yet have a workable mathematical frame-work:
We were able to prove two not very impressive theorems, and then the
way forward was blocked. Now we have what appears to be a good mathematical
framework, and have been able to prove a number of theorems. Things
are starting to work, things are starting to get interesting, and there are many
technical questions, many open problems, to work on.

So this is a working paper, a progress report, intended to promote interest
in the field and get others to participate in the research. There is much to be

Artificial Intuition

Artificial Intuition – A New Possible Path To Artificial Intelligence – by Monica Anderson

Artificial Intelligence was born in Computer Science departments, and inherited their value sets including Correctness. This mindset, this necessity to be logical, provable, and correct has been a fatal roadblock for Artificial Intelligence since its inception. The world is Bizarre, and Logic can not describe it. Artificial Intuition will easily outperform Logic based Artificial Intelligence for almost any problem in a Bizarre problem domain. From the very beginning, Artificial Intelligence should have been a soft science.

Most humans have not been taught logical thinking, but most humans are still intelligent. Most of our daily actions such as walking, talking, and understanding the world are based on Intuition, not Logic.
I capitalize (for stylistic reasons) all major named memes such as “Intuition” and “Logic”
Others have used the label “Artificial Intuition” for other ideas.
I will attempt to show that it is implausible that the brain should be based on Logic. I believe Intelligence emerges from millions of nested micro-intuitions, and that true Artificial Intelligence requires Artificial Intuition.
Intuition is surprisingly easy to implement in computers, but requires a lot of memory.

In what follows I will argue that AN approaches are Biologically Plausible; that they rather elegantly sidestep many problems and limitations of Logic-based AI approaches; and that they are likely to be implementable in current or near-future generations of computer hardware.

LLATech’s “Artificial Intelligence And Intuition” article :

Roger Penrose considered it impossible. Thinking could never imitate a computer process. He said as much in his book, The Emperor’s New Mind. But, a new book, The Intuitive Algorithm, (IA), suggested that intuition was a pattern recognition process. Intuition propelled information through many neural regions like a lightning streak. Data moved from input to output in a reported 20 milliseconds. The mind saw, recognized, interpreted and acted. In the blink of an eye. Myriad processes converted light, sound, touch and smell instantly into your nerve impulses. A dedicated region recognized those impulses as objects and events. The limbic system, another region, interpreted those events to generate emotions. A fourth region responded to those emotions with actions. The mind perceived, identified, evaluated and acted. Intuition got you off the hot stove in a fraction of a second. And it could be using a simple algorithm.

Wired : ‘Artificial Intuition,’ Earthquake Detectors vie for Pentagon Prize

… an Israeli high-tech firm that has developed “artificial intuition” software that can scan large batches of documents in Arabic and other languages. According to the company’s website, this tool can “instantly assesses any Arabic-language document, determines whether it contains content of a terrorist nature or of intelligence value, provides a first-tier Intelligence Analysis Report of the main requirement-relevant elements in the document.

π in assembly (spigot algorithm)

//   pi_spigot.s - calculates Pi using a spigot algorithm
//                 as an array of n digits in base 10000.
//                 http://mathworld.wolfram.com/SpigotAlgorithm.html
//  x86-64/SSE3 with for Linux, Intel, gnu assembler, gcc
//  assemble: as pi_spigot.s -o pi_spigot.o
//  link:     gcc -o pi_spigot pi_spigot.o
//  example run:      ./pi_spigot 100
//  output: 3.14159265358979323846264338327950288419716939937510582097494459230 ...
//        ... 78164062862089986280348253421170679

    .section    .rodata
    .string "%d."
    .string "%04d"
.globl print
    .type   print, @function
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movq    %rdi, -24(%rbp)
    movl    %esi, -28(%rbp)
    movq    -24(%rbp), %rax
    addq    $2, %rax
    movzwl  (%rax), %eax
    movzwl  %ax, %edx
    movl    $.LC0, %eax
    movl    %edx, %esi
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    movl    $2, -4(%rbp)
    jmp .L2
    movl    -4(%rbp), %eax
    addq    %rax, %rax
    addq    -24(%rbp), %rax
    movzwl  (%rax), %eax
    movzwl  %ax, %edx
    movl    $.LC1, %eax
    movl    %edx, %esi
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    addl    $1, -4(%rbp)
    movl    -28(%rbp), %eax
    subl    $1, %eax
    cmpl    -4(%rbp), %eax
    jg  .L3
    movl    $10, %edi
    call    putchar
    .size   print, .-print
.globl main
    .type   main, @function
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    pushq   %rbx
    subq    $56, %rsp
    movl    %edi, -52(%rbp)
    movq    %rsi, -64(%rbp)
    cmpl    $1, -52(%rbp)
    jle .L6
    .cfi_offset 3, -24
    movq    -64(%rbp), %rax
    addq    $8, %rax
    movq    (%rax), %rax
    movq    %rax, %rdi
    call    atoi
    addl    $3, %eax
    leal    3(%rax), %edx
    testl   %eax, %eax
    cmovs   %edx, %eax
    sarl    $2, %eax
    addl    $3, %eax
    jmp .L7
    movl    $253, %eax
    movl    %eax, -20(%rbp)
    movl    -20(%rbp), %eax
    addq    %rax, %rax
    movq    %rax, %rdi
    call    malloc
    movq    %rax, -40(%rbp)
    movl    -20(%rbp), %eax
    leaq    (%rax,%rax), %rdx
    movq    -40(%rbp), %rax
    movl    $0, %esi
    movq    %rax, %rdi
    call    memset
    movq    -40(%rbp), %rax
    addq    $2, %rax
    movw    $4, (%rax)
    cvtsi2sd    -20(%rbp), %xmm0
    movsd   .LC2(%rip), %xmm1
    mulsd   %xmm1, %xmm0
    cvttsd2si   %xmm0, %eax
    movl    %eax, -24(%rbp)
    jmp .L8
    movl    $0, -32(%rbp)
    movl    -20(%rbp), %eax
    subl    $1, %eax
    movl    %eax, -28(%rbp)
    jmp .L9
    movl    -28(%rbp), %eax
    addq    %rax, %rax
    addq    -40(%rbp), %rax
    movzwl  (%rax), %eax
    movzwl  %ax, %eax
    imull   -24(%rbp), %eax
    addl    %eax, -32(%rbp)
    movl    -28(%rbp), %eax
    addq    %rax, %rax
    movq    %rax, %rbx
    addq    -40(%rbp), %rbx
    movl    -32(%rbp), %ecx
    movl    $1759218605, %edx
    movl    %ecx, %eax
    imull   %edx
    sarl    $12, %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    movl    %edx, %esi
    subl    %eax, %esi
    movl    %esi, %eax
    imull   $10000, %eax, %eax
    movl    %ecx, %edx
    subl    %eax, %edx
    movl    %edx, %eax
    movw    %ax, (%rbx)
    movl    -32(%rbp), %ecx
    movl    $1759218605, %edx
    movl    %ecx, %eax
    imull   %edx
    sarl    $12, %edx
    movl    %ecx, %eax
    sarl    $31, %eax
    movl    %edx, %ecx
    subl    %eax, %ecx
    movl    %ecx, %eax
    movl    %eax, -32(%rbp)
    subl    $1, -28(%rbp)
    cmpl    $0, -28(%rbp)
    jns .L10
    movl    $0, -44(%rbp)
    movl    -44(%rbp), %eax
    movl    %eax, -48(%rbp)
    movl    $0, -28(%rbp)
    jmp .L11
    movl    -24(%rbp), %eax
    addl    %eax, %eax
    leal    1(%rax), %edx
    movl    -28(%rbp), %eax
    addq    %rax, %rax
    addq    -40(%rbp), %rax
    movzwl  (%rax), %eax
    movzwl  %ax, %ecx
    movl    -44(%rbp), %eax
    imull   $10000, %eax, %eax
    leal    (%rcx,%rax), %eax
    movl    %edx, %esi
    movl    %eax, %edi
    call    div
    movq    %rax, -48(%rbp)
    movl    -28(%rbp), %eax
    addq    %rax, %rax
    addq    -40(%rbp), %rax
    movl    -48(%rbp), %edx
    movw    %dx, (%rax)
    addl    $1, -28(%rbp)
    movl    -28(%rbp), %eax
    cmpl    -20(%rbp), %eax
    jl  .L12
    movq    -40(%rbp), %rax
    addq    $2, %rax
    movq    -40(%rbp), %rdx
    addq    $2, %rdx
    movzwl  (%rdx), %edx
    addl    $2, %edx
    movw    %dx, (%rax)
    subl    $1, -24(%rbp)
    cmpl    $0, -24(%rbp)
    jg  .L13
    movl    -20(%rbp), %edx
    movq    -40(%rbp), %rax
    movl    %edx, %esi
    movq    %rax, %rdi
    call    print
    movl    $0, %eax
    addq    $56, %rsp
    popq    %rbx
    .size   main, .-main
    .section    .rodata
    .align 8
    .long   3161095930
    .long   1076532084

Happy π approximation day/night (in assembly) !

//   pi_x64.s - calculates Pi using the Leibniz formula.
//              Each iteration prints a closer approximation to 50 digits.
//              This is not an optimal implementation and it runs forever.
//  x86-64/SSE3 with for Linux, Intel, gnu assembler, gcc
//  assemble: as pi_x64.s -o pi_x64.o
//  link:     gcc -o pi_x64 pi_x64.o
//  run:      ./pi_x64
//  output: 3.14159264858204423376264458056539297103881835937500
//          3.14159265108366625440794450696557760238647460937500
//          3.14159265191852199450295302085578441619873046875000
//          3.14159265233600137889879988506436347961425781250000
//          .... and on forever ...

.section .data
        .align 16
denom:  .double  1.0, 3.0
numer:  .double  4.0, -4.0
add4:   .double  4.0, 4.0
zero:   .double  0.0, 0.0
msg:    .string  "%1.50fn"

.section .text
        .globl main
        .type main, @function
        .align 64
        pushq   %rbp
        movq    %rsp, %rbp

        movdqa  (numer), %xmm2
        movdqa  (denom), %xmm6
        movdqa  (add4), %xmm3
        movdqa  %xmm2, %xmm4
        movdqa  (zero), %xmm5
        movq    $100000000, %r12

        divpd  %xmm6, %xmm2
        addpd  %xmm2, %xmm5
        movdqa %xmm4, %xmm2
        addpd  %xmm3, %xmm6

        subq $1, %r12
        jnz loop

        movq   $100000000, %r12
        movdqa %xmm5, %xmm0
        movdqa %xmm6, %xmm1
        haddpd %xmm0, %xmm0

        movq  $1, %rax
        movq $msg, %rdi
        call printf

        movdqa (add4), %xmm3

        jmp loop

        movq $0, %rax
        popq %rbp

Generating π in Haskell

Haskell beats CL quite comfortably using the same algorithm :

module Main( main ) where

import System( getArgs )

arccot :: Integer -> Integer -> Integer
arccot x unity =
    arccot' x unity 0 start 1 1
      where start = unity `div` x
            arccot' x unity sum xpower n sign | xpower `div` n == 0 = sum
                                              | otherwise           =
                arccot' x unity (sum + sign*term) (xpower `div` (x*x)) (n+2) (-sign)
                  where term = xpower `div` n

machin_pi :: Integer -> Integer
machin_pi digits = 
    pi' `div` (10 ^ 10)
      where unity = 10 ^ (digits+10)
            pi' = 4 * (4 * arccot 5 unity - arccot 239 unity)

main :: IO ()
main = do
    args <- getArgs
    putStrLn (show (machin_pi (read (head args) :: Integer)))

The first 10000 digits.

> time ./machin_pi 10000

31415926535897932384626433832795028841971693993751058209749445923078164062862089 ...

real    0m0.381s
user    0m0.290s
sys     0m0.061s

Saintly Man? – That Mitchell & Webb Look – BBC Two