https://github.com/jgrant27/jngmisc/tree/master/rust/machin-pi

# Category Archives: Functional Programming

# Two sorts with Rust

Here are some initial thoughts on Rust in the almost two years since I last looked at it along with some implementations of merge and quick sort. (These are just my opinions so please don’t panic !)

1. Cargo is awesome for managing package dependencies and building projects.

2. Rust is a very nice systems programming language that supports a functional style of programming. Much easier to work with than C/C++ with very near the same performance in many cases.

3. Strong static inferred type system !

4. The authors have not neglected the rich and varied history of programming language theory while trying to be very pragmatic about the design and usefulness of the language.

Let’s look at implementing some popular sorting algorithms. First … quick sort.

I won’t be pedantic here by explaining the quick sort algorithm but an elegant and performant implementation is ~20 LOC. Not bad for a systems level programming language.

pub fn quicksort_rec(nums: Vec) -> Vec { return match nums.len() { cnt if cnt <= 1 => nums, cnt => { let mut left = Vec::new(); let mut right = Vec::new(); let pivot = nums[0]; for i in 1..cnt { match nums[i] { num if num < pivot => left.push(num), num => right.push(num), } } let mut left_sorted = quicksort_rec(left); let mut right_sorted = quicksort_rec(right); left_sorted.push(pivot); left_sorted.append(&mut right_sorted); return left_sorted; }, }; }

An implementation of merge sort is a bit longer at ~35 LOC.

fn merge(mut left: Vec, mut right: Vec ) -> Vec { let mut merged = Vec::new(); while !left.is_empty() && !right.is_empty() { if left.last() >= right.last() { merged.push(left.pop().unwrap()); } else { merged.push(right.pop().unwrap()); } } while !left.is_empty() { merged.push(left.pop().unwrap()); } while !right.is_empty() { merged.push(right.pop().unwrap()); } merged.reverse(); return merged; } pub fn mergesort_rec(nums: Vec ) -> Vec { return match nums.len() { cnt if cnt <= 1 => nums, cnt => { let mut left = Vec::new(); let mut right = Vec::new(); let middle = cnt / 2; for i in (0..middle).rev() { left.push(nums[i]); } for i in (middle..cnt).rev() { right.push(nums[i]); } left = mergesort_rec(left); right = mergesort_rec(right); return merge(left, right); }, }; }

Lastly, here are the timings for my very CPU under-powered laptop …

# Ring probabilities in F#

A few months back I took a look at Elixir. More recently I’ve been exploring F# and I’m very pleased with the experience so far. Here is the ring probabilities algorithm implemented using F#. It’s unlikely that I will ever use Elixir again because having a powerful static type system provided by F# at my disposal is just too good.

let rec calcStateProbs (prob: float, i: int, currProbs: float [], newProbs: float []) = if i < 0 then newProbs else let maxIndex = currProbs.Length-1 // Match prev, next probs based on the fact that this is a // ring structure. let (prevProb, nextProb) = match i with | i when i = maxIndex -> (currProbs.[i-1], currProbs.[0]) | 0 -> (currProbs.[maxIndex], currProbs.[i+1]) | _ -> (currProbs.[i-1], currProbs.[i+1]) let newProb = prob * prevProb + (1.0 - prob) * nextProb Array.set newProbs i newProb calcStateProbs(prob, i-1, currProbs, newProbs) let calcRingProbs parsedArgs = // Probs at S = 0. // Make certain that we are positioned at only start location. // e.g. P(Start Node) = 1 let startProbs = Array.concat [ [| 1.0 |] ; [| for _ in 1 .. parsedArgs.nodes - 1 -> 0.0 |] ] let endProbs = List.fold (fun probs _ -> calcStateProbs(parsedArgs.probability, probs.Length-1, probs, Array.create probs.Length 0.0)) startProbs [1..parsedArgs.states] endProbs

Here’s the code.

No promises this time but I may follow this sequential version up with a parallelized version.

# Ring probabilities with Elixir

I’ve been hearing more about Elixir lately so I thought I’d take it for a spin.

*“Elixir is a functional, meta-programming aware language built on top of the Erlang VM. It is a dynamic language that focuses on tooling to leverage Erlang’s abilities to build concurrent, distributed and fault-tolerant applications with hot code upgrades.”
*

I’ve never really spent any time with Erlang but always been curious about it and the fact that it’s one of the best kept ‘secrets’ in many startups these days. I’ve heard for years how easy it is to ‘scale out’ compared with many other languages and platforms.

Joe Armstrong, the creator of Erlang, wrote a post about Elixir in which he seemed to really like it except for some minor things. This got me even more curious so I decided to write some code that seemed like it could benefit from the features provided by Elixir for easily making suitable algorithms parallelizable.

Let’s talk about Ring probabilities. Let’s say we had a cluster of N nodes in a ring topology. We then might have some code that requires S steps to be run and each subsequent step is run on a node to the right of the previous node with some probability P.

In the initial state (S=0) the probability of some piece of code running on node A is P=1.

At the next step (S=1) the probability of the step running on a node to the right in the ring is P and the probability of the step running on a node to the left is 1-P.

Here is an example with some crude ascii diagrams to visually represent this :

Initial node probablity for 5 node ring at S=0 is P=1 for starting node. N = 5 (nodes) For S = 0 (initial state) 1 - P = 0.5βββββββββββββββββ P = 0.5 Counter-clockwise Clockwise +-----+ |βP =β| +----------+β1.0β+----------+ |ββββββββββ+-----+ββββββββββ| +--+--+βββββββββββββββββββββ+--+--+ |β0.0 |βββββββββββββββββββββ|β0.0β| +--+--+βββββββββββββββββββββ+--+--+ |βββββββββββββββββββββββββββ| |βββββββββββββββββββββββββββ| |ββ+-----+βββββββββ+-----+ββ| +--+β0.0 +---------+β0.0β+--+ +-----+βββββββββ+-----+ Node probablities for the same 5 node ring after 2 state transitions N = 5 (nodes) S = 2 1 - P = 0.5βββββββββββββββββ P = 0.5 Counter-clockwiseβ Clockwise +-----+ |βP =β| +----------+β0.5β+-------------+ |ββββββ β +-----+ββ ββ βββββ β| +--+--+βββββββββ βββββββββββ β +--+--+ |β0.0 |βββββββ βββββββββββ βββ |β0.0β| +--+--+ββββββ βββββββββββ ββββ +--+--+ |β βββββββββββ ββββββββββββββ | | ββ βββββββββββββββββββββββββ | |ββ+------+βββββββββ+-------+ββ| +--+β0.25 +---------+β0.25β +--+ +------+βββββββββ+-------+

Let’s first write the sequential version of the algorithm to calculate the ring probabilities. The parallel version will be handled in the next post. Data types in Elixir are pretty basic at this point with Elixir still having not reached 1.0. I decided to use an array to represent the ring in anticipation of later parallelizing the algorithm. A list seemed unsuitable for this due to access times being linear and that a parallel map across the structure would most likely be required. For a sequential version it’s interesting that a list is the fastest data structure to use in combination with recursion and pattern matching but I’ll get into that in the next post.

For now let’s get back to implementing a sequential version with an array and the map function …

Elixir doesn’t have an array or vector type (yet ?). I’m not going to comment on this. Instead we will use Erlang’s array type. Dipping into Erlang libraries from Elixir is pretty trivial so it’s no big deal other than Elixir’s parameter conventions for function calls is the reverse of Erlang’s and this can be a little annoying.

Let’s look at the function for calculating the node probabilities given a direction change probability, number of nodes and state count :

def calc_ring_probs(p, n, s) when is_float(p) and p >= 0 and p <= 1 and is_integer(n) and n > 0 and is_integer(s) and s >= 0 do # Probs at S = 0. # Certain that we are positioned at only start location. # e.g. P(Start Node) = 1 initial_probs = :array.new [size: n, fixed: true, default: 0.0] initial_probs = :array.set 0, 1.0, initial_probs final_probs = initial_probs IO.puts "Calculating ring node probabilities where P=#{p} N=#{n} S=#{s} ...\n" # If we are moving beyond the initial state then do the calculation ... if s > 0 do # ... through all the states ... final_probs = reduce 1..s, initial_probs, fn (_, new_probs) -> calc_state_probs(p, new_probs) end end final_probs end

The first thing you might notice at the beginning of calc_ring_probs are the guard clauses (when …) after the function parameter definition. This is a nice way of ensuring some pre-conditions are met for the function to return meaningful results.

We check that the probability parameter is a float within the range 0.0 -> 1.0, we also make sure that there are more than zero nodes and this must be an integer and that the state is either zero or more and also an integer.

Next the initial probabilities are created using an Erlang array. If the state required is not the initial state (S=0) then we reduce for the number of states and calculate the probabilities of the ring at each state(calc_state_probs) until we reach the final state.

Now let’s look at the implementation of calc_state_probs.

def calc_state_probs(p, prev_probs) when is_float(p) and p >= 0 and p <= 1 do sz = :array.size(prev_probs) :array.map fn(i, _) -> prev_i = if i == 0 do sz - 1 else i - 1 end prev_prob = :array.get(prev_i, prev_probs) next_i = if i == sz - 1 do 0 else i + 1 end next_prob = :array.get(next_i, prev_probs) p * prev_prob + (1 - p) * next_prob end, prev_probs end

The function takes the probability P and an array of the previous probabilities. We determine the previous and next node probability indexes based on the current index. If the current index is the first or last index in the array then the previous index is the last and the next index is the first, respectively. We calculate the current index using the respective probability P or 1-P and the previous and next node probabilities.

That’s really all there is to the sequential version.

On a macbook air a 1,000,000 node ring over 10 state changes takes ~7.4 seconds.

bash-3.2$ mix run lib/ring_probs.ex 0.5 1000000 10 Calculating ring node probabilities where P=0.5 N=1000000 S=10 ... {:array, 50, 0, 0.0, {{0.24609375, 0.0, 0.205078125, 0.0, 0.1171875, 0.0, 0.0439453125, 0.0, 0.009765625, 0.0}, {9.765625e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, 10, 10, 10, 10, 10, 10}} ... 999950 node probabilities ... {:array, 999950, 0, 0.0, {{{{{{0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {9.765625e-4, 0.0, 0.009765625, 0.0, 0.0439453125, 0.0, 0.1171875, 0.0, 0.205078125, 0.0}, 10, 10, 10, 10, 10, 10}, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100}, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000}, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000}, 100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000, 100000}} calc time: 7370.12 msecs

The complete code can be found here.

My reluctance with Elixir is that it’s a strong dynamically typed language. This is much the same issue I’ve had with Erlang. There are ways to work around this. One way is using a static analysis tool. Read this for more info. Apparently success types are a way to correctly infer types in Erlang. I can’t say that I’m convinced, my own experience has shown that any production system that needs to scale at least requires something like type-hinting. I might be wrong and in fact I hope I am because I like what I’ve seen regarding Elixir and heard about the Erlang VM for building distributed systems.

In the next post we’ll re-write the code to make it run concurrently and also look at how a sequential version of the algorithm using recursion, pattern matching and lists is an order of magnitude faster than the sequential version using arrays in this post.

The sequential recursive version may even be faster than a concurrent version depending on how many cores your machine has ;-)

# Clojure’s growing popular mind share

Popular interest in Clojure has rapidly increased over the last few years since 2008, almost to the level of Java (the language) today, which has dropped off significantly.Β (At least according to Google web trends.)

In contrast, popular interest in Common Lisp seems to have dropped off steadily since 2004.

I used “java language” instead of “java” because it is ambiguous enough to mean the language, framework, JVM, the island or the coffee.

# Purely Functional Data Structures & Algorithms : Selection Sort

***Updated @ 2012-08-31 02:08:58 due to internet pedantry***

According to Wikipedia :

InΒ computer science, aΒ

Selection sortΒ is aΒ sorting algorithm, specifically anΒ in-placeΒ comparison sort. It hasΒ O(n^{2}) time complexity, making it inefficient on large lists, and generally performs worse than the similarΒ insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

(A functional implementation of selection sort is however, **not** an in-place sort.)

Behold the abomination which is the imperative implementation (from theΒ WikipediaΒ link) :

int i,j; int iMin; for (j = 0; j < n-1; j++) { iMin = j; for ( i = j+1; i < n; i++) { if (a[i] < a[iMin]) { iMin = i; } } if ( iMin != j ) { swap(a[j], a[iMin]); } }

Now, the functional equivalent in Haskell :

selectionSort :: [Int] -> [Int] -> [Int] selectionSort sorted [] = reverse sorted selectionSort sorted unsorted = selectionSort (min:sorted) (delete min unsorted) where min = minimum unsorted

Or in Shen :

(define selection-sort-aux { (list number) --> (list number) --> (list number) } Sorted [] -> (reverse Sorted) Sorted Unsorted -> (let Min (minimum Unsorted) (selection-sort-aux (cons Min Sorted) (remove-first Min Unsorted))))

Yes. These functional snippets use their respective implementations of the list type (which is not an efficient persistent data type in either Haskell or Shen for accesses or updates). Replacing the List type with Data.Sequence(a persistent data type with efficient constant access and update) for the Haskell snippet is trivial. I’ll leave that as an exercise to the reader. Shen is too new to support theseΒ efficientΒ persistent types at the moment but implementations will appear in the future and changing the snippet would also be trivial. A Clojure implementation using it’s already built in efficient persistent types would also be trivial.

The complete code can be found here.

# Purely Functional Data Structures & Algorithms : Union-Find (Haskell)

***Updated 08-23-2012 01:04:38***

**Replaced the use of Data.Vector with the persistent Data.Sequence which has O(logN) worst case time complexity on updates.**

A Haskell version of the previous codeΒ using the more efficient(access and update) persistent Data.Sequence type so that the desired time complexity is maintained for the union operation.

-- Disjoint set data type (weighted and using path compression). -- O((M+N)lg*N + 2MlogN) worst-case union time (practically O(1)) -- For M union operations on a set of N elements. -- O((M+N)lg*N) worst-case find time (practically O(1)) -- For M connected(find) operations on a set of N elements. data DisjointSet = DisjointSet { count :: Int, ids :: (Seq Int), sizes :: (Seq Int) } deriving (Read, Show) -- Return id of root object findRoot :: DisjointSet -> Int -> Int findRoot set p | p == parent = p | otherwise = findRoot set parent where parent = index (ids set) (p - 1) -- Are objects P and Q connected ? connected :: DisjointSet -> Int -> Int -> Bool connected set p q = (findRoot set p) == (findRoot set q) -- Replace sets containing P and Q with their union quickUnion :: DisjointSet -> Int -> Int -> DisjointSet quickUnion set p q | i == j = set | otherwise = DisjointSet cnt rids rsizes where (i, j) = (findRoot set p, findRoot set q) (i1, j1) = (index (sizes set) (i - 1), index (sizes set) (j - 1)) (cnt, psmaller, size) = (count set - 1, i1 < j1, i1 + j1) -- Always make smaller root point to the larger one (rids, rsizes) = if psmaller then (update (i - 1) j (ids set), update (j - 1) size (sizes set)) else (update (j - 1) i (ids set), update (i - 1) size (sizes set))

Tested …

jgrant@aristotle:~/jngmisc/haskell$ ghc quick_union.hs ; time ./quick_union 10 creating union find with 10 objects ...DONE DisjointSet {count = 10, ids = fromList [1,2,3,4,5,6,7,8,9,10], sizes = fromList [1,1,1,1,1,1,1,1,1,1]} All objects are disconnected. 1 and 9 connected ? False 4 and 6 connected ? False 3 and 1 connected ? False 7 and 8 connected ? False creating unions ...DONE DisjointSet {count = 1, ids = fromList [4,8,7,7,8,8,8,8,8,8], sizes = fromList [1,1,1,2,1,1,4,10,1,1]} All objects are connected (only 1 group). 1 and 9 connected ? True 4 and 6 connected ? True 3 and 1 connected ? True 7 and 8 connected ? True real 0m0.002s user 0m0.000s sys 0m0.000s

# Purely Functional Data Structures & Algorithms : Union-Find

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forestsΒ and the complementary algorithm : union-find.

*InΒ computing, aΒ disjoint-set data structureΒ is aΒ data structureΒ that keeps track of aΒ setΒ of elementsΒ partitionedΒ into a number ofΒ disjointΒ (nonoverlapping) subsets. AΒ union-find algorithmΒ is an algorithm that performs two useful operations on such a data structure:*

*Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.**Union: Join two subsets into a single subset.*

I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.

\**\ \* Disjoint set data type (weighted and using path compression) demonstrating *\ \* 5(m + n) worst-case find time *\ \**\ (datatype disjoint-set Count : number ; Ids : (vector number) ; Sizes : (vector number); ================================================================= [Count Ids Sizes] : disjoint-set;)

\* Create a new disjoint-set type *\ (define new { number --> disjoint-set } N -> [N (range 1 N) (vector-init 1 N)])

\* Return the number of disjoint sets *\ (define count { disjoint-set --> number } [Count Ids Sizes] -> Count)

\* Return id of root object *\ (define find-root { disjoint-set --> number --> number } [Count Ids Sizes] P -> (let Parent \* Path Compression *\ (<-vector Ids (<-vector Ids P)) (if (= P Parent) P (find-root [Count Ids Sizes] Parent)))

\* Are objects P and Q in the set ? *\ (define connected { disjoint-set --> number --> number --> boolean } UF P Q -> (= (find-root UF P) (find-root UF Q)))

\* Replace sets containing P and Q with their union *\ (define quick-union { disjoint-set --> number --> number --> disjoint-set } [Count Ids Sizes] P Q -> (let UF [Count Ids Sizes] I (find-root UF P) J (find-root UF Q) SizeI (<-vector Sizes I) SizeJ (<-vector Sizes J) SizeSum (+ SizeI SizeJ) CIds (vector-copy Ids) CSizes (vector-copy Sizes) (if (= I J) [Count CIds CSizes] \* Always make smaller root point to larger one *\ (do (if (< SizeI SizeJ) (do (vector-> CIds I J) (vector-> CSizes J SizeSum)) (do (vector-> CIds J I) (vector-> CSizes I SizeSum))) [(- Count 1) CIds CSizes]))))

(50+) (test 10) creating union find with 10 objects ... DONE [10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>] All objects are disconnected : 1 and 9 connected ? false 4 and 6 connected ? false 3 and 1 connected ? false 7 and 8 connected ? false ... creating unions ... DONE [1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>] All objects should be connected as there is only 1 group : 1 and 9 connected ? true 4 and 6 connected ? true 3 and 1 connected ? true 7 and 8 connected ? true run time: 0.0 secs 1 : number

# Welcome to John McCarthy’s new website.

From the website: John was a legendary computer scientist at Stanford University who developed time-sharing, invented LISP, and founded the field of Artificial Intelligence.* In March 2011 John launched Project JMC with the objective to make his work more approachable and accessible. The Project JMC team is continuing to help realize his objective. In this site you will find all John’s work, including his social commentary, and acknowledgements of his outstanding contributions and impact. Additional comments, suggestions, stories, photographs and videos on John and his work are very welcome. Please send them to the Project JMC team. His old website is here.

# Quick Sort in Shen

A Shen type-checked implementation of Quick Sort is even more elegant/terse compared with the CL version posted previously.

Pattern-matching and currying make this possible.

(tc +) (define filter {(A --> boolean) --> (list A) --> (list A)} _ [] -> [] T? [A | B] -> (append [A] (filter T? B)) where (T? A) T? [_ | B] -> (filter T? B)) (define quick-sort-generic {(list A) --> (A --> A --> boolean) --> (A --> A --> boolean) --> (list A)} [] _ _ -> [] [A | B] L? R? -> (append (quick-sort-generic (filter (R? A) B) L? R?) [A] (quick-sort-generic (filter (L? A) B) L? R?))) \* descending with duplicates *\ * (quick-sort-generic [3 1 2 7 9 6 6 3 0] >= <) * [9 7 6 6 3 3 2 1 0] : (list number)

The complete code can be found here. Based on this numeric version.