// pi_x64.s - calculates Pi using the Leibniz formula. // Each iteration prints a closer approximation to 50 digits. // This is not an optimal implementation and it runs forever. // // x86-64/SSE3 with for Linux, Intel, gnu assembler, gcc // // assemble: as pi_x64.s -o pi_x64.o // link: gcc -o pi_x64 pi_x64.o // run: ./pi_x64 // output: 3.14159264858204423376264458056539297103881835937500 // 3.14159265108366625440794450696557760238647460937500 // 3.14159265191852199450295302085578441619873046875000 // 3.14159265233600137889879988506436347961425781250000 // .... and on forever ... .section .data .align 16 denom: .double 1.0, 3.0 numer: .double 4.0, -4.0 add4: .double 4.0, 4.0 zero: .double 0.0, 0.0 msg: .string "%1.50fn" .section .text .globl main .type main, @function .align 64 main: pushq %rbp movq %rsp, %rbp movdqa (numer), %xmm2 movdqa (denom), %xmm6 movdqa (add4), %xmm3 movdqa %xmm2, %xmm4 movdqa (zero), %xmm5 movq $100000000, %r12 loop: divpd %xmm6, %xmm2 addpd %xmm2, %xmm5 movdqa %xmm4, %xmm2 addpd %xmm3, %xmm6 subq $1, %r12 jnz loop movq $100000000, %r12 movdqa %xmm5, %xmm0 movdqa %xmm6, %xmm1 haddpd %xmm0, %xmm0 movq $1, %rax movq $msg, %rdi call printf movdqa (add4), %xmm3 jmp loop movq $0, %rax popq %rbp ret

# Category Archives: Programming

Reply

# Generating π in Haskell

Haskell beats CL quite comfortably using the same algorithm :

module Main( main ) where import System( getArgs ) arccot :: Integer -> Integer -> Integer arccot x unity = arccot' x unity 0 start 1 1 where start = unity `div` x arccot' x unity sum xpower n sign | xpower `div` n == 0 = sum | otherwise = arccot' x unity (sum + sign*term) (xpower `div` (x*x)) (n+2) (-sign) where term = xpower `div` n machin_pi :: Integer -> Integer machin_pi digits = pi' `div` (10 ^ 10) where unity = 10 ^ (digits+10) pi' = 4 * (4 * arccot 5 unity - arccot 239 unity) main :: IO () main = do args <- getArgs putStrLn (show (machin_pi (read (head args) :: Integer)))

The first 10000 digits.

> time ./machin_pi 10000 31415926535897932384626433832795028841971693993751058209749445923078164062862089 ... real 0m0.381s user 0m0.290s sys 0m0.061s

Saintly Man? – That Mitchell & Webb Look – BBC Two

# Generating π in CL (faster)

Thanks to metacircular for pointing out that (floor (/ x y)) can be written as (floor x y) while avoiding

the intermediate rational.

(defun machin-pi (digits) "Calculates PI digits using fixed point arithmetic and Machin's formula with double recursion" (labels ((arccot-minus (xsq n xpower) (let ((term (floor xpower n))) (if (= term 0) 0 (- (arccot-plus xsq (+ n 2) (floor xpower xsq)) term)))) (arccot-plus (xsq n xpower) (let ((term (floor (/ xpower n)))) (if (= term 0) 0 (+ (arccot-minus xsq (+ n 2) (floor xpower xsq)) term)))) (arccot (x unity) (let ((xpower (floor (/ unity x)))) (arccot-plus (* x x) 1 xpower)))) (let* ((unity (expt 10 (+ digits 10))) (thispi (* 4 (- (* 4 (arccot 5 unity)) (arccot 239 unity))))) (floor thispi (expt 10 10)))))

The first 10000 digits again.

* (time (machin-pi 10000)) Evaluation took: 0.662 seconds of real time 0.634038 seconds of total run time (0.495454 user, 0.138584 system) [ Run times consist of 0.233 seconds GC time, and 0.402 seconds non-GC time. ] 95.77% CPU 1,491,387,858 processor cycles 109,530,592 bytes consed 31415926535897932384626433832795028841971693993751058209749445923078164062862089 ...

Algorithmic optimizations would take us much further. For example the Gauss–Legendre or Salamin–Brent formula.

Then there is the fastest known(at the turn of the millenium), Chudnovsky’s formula :

# Generating π in CL

Update 2009-07-23 : Faster version in CL and a Haskell version.

——————————————————————————–

A trivial approximation using the Leibniz formula.

(defun leibniz-pi() (labels ((local-pi(sum n) (if (< n (expt 10 6)) (local-pi (+ sum (/ (expt -1 n) (+ (* 2 n) 1))) (+ n 1)) sum))) (* 4 (local-pi 0f0 0f0))))

And here’s a longer version(but faster and more precise) using Machin’s formula with fixed point arithmetic to x digits.

(defun machin-pi (digits) "Calculates PI digits using fixed point arithmetic and Machin's formula with double recursion" (labels ((arccot-minus (xsq n xpower) (let ((term (floor (/ xpower n)))) (if (= term 0) 0 (- (arccot-plus xsq (+ n 2) (floor (/ xpower xsq))) term)))) (arccot-plus (xsq n xpower) (let ((term (floor (/ xpower n)))) (if (= term 0) 0 (+ (arccot-minus xsq (+ n 2) (floor (/ xpower xsq))) term)))) (arccot (x unity) (let ((xpower (floor (/ unity x)))) (arccot-plus (* x x) 1 xpower)))) (let* ((unity (expt 10 (+ digits 10))) (thispi (* 4 (- (* 4 (arccot 5 unity)) (arccot 239 unity))))) (floor (/ thispi (expt 10 10))))))

The first 10000 digits.

* (time (pidigits 10000)) Evaluation took: 2.496 seconds of real time 1.149998 seconds of total run time (0.974647 user, 0.175351 system) [ Run times consist of 0.325 seconds GC time, and 0.825 seconds non-GC time. ] 46.07% CPU 5,626,335,988 processor cycles 217,893,792 bytes consed 31415926535897932384626433832795028841971693993751058209749445923078164062862089 ...