Google to acquire ITA ?

Update 2010-06-30 : So just over a day after I posted this entry Google announced that they have acquired ITA. Announcement


There was buzz back in April about Google possibly acquiring ITA Software.

A few days ago Dan posted that these were just rumors based on a single Bloomberg article.

Well it seems that it may be more than just a rumor : The Wall Street journal now has an article regarding the potential acquisition.
The silence seems quite confirming : “Google declined to comment. ITA didn’t respond to calls seeking comment.”.

Also fairly interesting is that ITA uses a very similar combination of languages to what Google does : Python, C++, Java. Would this factor in Google’s interest ?

Hopefully, if this does happen, the ITA Lisp hackers get a substantial financial reward for all their innovative work. Fingers crossed …

Summer 2010 reading

“Let over Lambda – 50 Years of Lisp” by Doug Hoyte

This one had been sitting on my bookshelf for almost a year.

“Let Over Lambda is one of the most hardcore computer programming books out there. Starting with the fundamentals, it describes the most advanced features of the most advanced language: COMMON LISP. The point of this book is to expose you to ideas that you might otherwise never be exposed to.”

and

“Macros are what make lisp the greatest programming language in the world.”

and

“If you are looking for a dry coding manual that re-hashes common-sense techniques in whatever langue du jour, this book is not for you. This book is about pushing the boundaries of what we know about programming.”

“Only the top percentile of programmers use Lisp and if you can understand this book you are in the top percentile of Lisp programmers.”

I’m sure that the author has received his fair share of derision for making statements like these and for his clear and well researched content showing how Lisp is the greatest programming language ever invented. It may come off as conceited to some, but he is right. He’s also in good company with the greatest computer scientists ever known who have also made similarly bold statements regarding Lisp. This style is lacking in most technical writing today, there has been too much coddling and dilution in programming texts over the last decade. The fact is that unless you have mastered this language then you are in no position to even begin disussing this topic. This book, much like any of the posts on this site, is not for convincing the majority of Blub programmers that they should be using Lisp but to encourage a few that have an appreciation for the world’s greatest programming language to look even deeper.

Here’s a quote from the first chapter for those that might have known for a long while that there is more to programming than the hype of formal object systems that we’ve all been subjected to for the past few decades :

“Object systems are a formalisation of a subset of let and lambda combinations, sometimes with gimmicks like inheritance bolted on. Because of this, Lisp programmers often don’t think in terms of classes and objects. Let and lambda are fundamental; objects and classes are derivatives. As Steele says, the ‘object’ need not be a primitive notion in programming languages. Once assignable value cells and good old lambda expressions are available, object systems are, at best, occasionally useful abstractions and, at worst, special-case and redundant.”

Buy the book. Programming language choice matters.

“Introduction to Metamathematics” by Stephen Cole Kleene

Not much more needs to be said about this one.

“The Stuff of Thought: Language as a Window into Human Nature” by Steven Pinker

From the Washington Post :

“Language comes so naturally to us that it’s easy to believe there’s some sort of intrinsic logic connecting the thing and its name, the signifier and the signified. In one of Plato’s dialogues, a character named Cratylus argues that “a power more than human gave things their first names.”

But Cratylus was wrong. Human language is an emanation of the human mind. A thing doesn’t care what we call it. Words and their rules don’t tell us about the world; they tell us about ourselves.

That’s the simple premise behind Steven Pinker’s latest work of popular science. According to the Harvard psychologist, people are “verbivores, a species that lives on words.” If you want to understand how the brain works, how it thinks about space and causation and time, how it processes emotions and engages in social interactions, then you need to plunge “down the rabbit hole” of language. The quirks of our sentences are merely a portal to the mind.

In The Stuff of Thought, Pinker pitches himself as the broker of a scientific compromise between “linguistic determinism” and “extreme nativism.” The linguistic determinists argue that language is a prison for thought. The words we know define our knowledge of the world. Because Eskimos have more nouns for snow, they are able to perceive distinctions in snow that English speakers cannot. While Pinker deftly discredits extreme versions of this hypothesis, he admits that “boring versions” of linguistic determinism are probably accurate. It shouldn’t be too surprising that our choice of words can frame events, or that our vocabulary reflects the kinds of things we encounter in our daily life. (Why do Eskimos have so many words for snow? Because they are always surrounded by snow.) The language we learn as children might not determine our thoughts, but it certainly influences them.

Extreme nativism, on the other hand, argues that all of our mental concepts — the 50,000 or so words in the typical vocabulary — are innate. We are born knowing about carburetors and doorknobs and iPods. This bizarre theory, most closely identified with the philosopher Jerry Fodor, begins with the assumption that the meaning of words cannot be dissected into more basic parts. A doorknob is a doorknob is a doorknob. It only takes Pinker a few pages to prove the obvious, which is that each word is not an indivisible unit. The mind isn’t a blank slate, but it isn’t an overstuffed filing cabinet either.

So what is Pinker’s solution? He advocates the middle ground of “conceptual semantics,” in which the meaning of our words depends on an underlying framework of basic cognitive concepts. (As Pinker admits, he owes a big debt to Kant.) The tenses of verbs, for example, are shaped by our innate sense of time. Nouns are constrained by our intuitive notions about matter, so that we naturally parcel things into two different categories, objects and substances (pebbles versus applesauce, for example, or, as Pinker puts it, “hunks and goo”). Each material category comes with a slightly different set of grammatical rules. By looking at language from the perspective of our thoughts, Pinker demonstrates that many seemingly arbitrary aspects of speech, like that hunk and goo distinction, aren’t arbitrary at all: They are byproducts of our evolved mental machinery.

Pinker tries hard to make this tour of linguistic theory as readable as possible. He uses the f-word to explore the topic of transitive and intransitive verbs. He clarifies indirect speech by examining a scene from “Tootsie,” and Lenny Bruce makes so many appearances that he should be granted a posthumous linguistic degree. But profanity from Lenny Bruce can’t always compensate for the cryptic vocabulary and long list of competing ‘isms. Sometimes, the payoff can be disappointing. After a long chapter on curse words — this book deserves an “explicit content” warning — Pinker ends with the banal conclusion that swearing is “connected with negative emotion.” I don’t need conceptual semantics to tell me that.

The Stuff of Thought concludes with an optimistic gloss on the power of language to lead us out of the Platonic cave, so that we can “transcend our cognitive and emotional limitations.” It’s a nice try at a happy ending, but I don’t buy it. The Stuff of Thought, after all, is really about the limits of language, the way our prose and poetry are bound by innate constraints we can’t even comprehend. Flaubert was right: “Language is a cracked kettle on which we beat out tunes for bears to dance to, while all the time we long to move the stars to pity.”

So you say that programming language choice does not matter …

One popular interview question that never dies : write some code to reverse a singly linked-list.
Now understanding the problem for interviewees is one thing but getting it right in an imperative language seems to be quite a feat based on my experience as an interviewer. Most take 15-20 minutes to get this completely correct and sometimes even longer.

Here’s a quick solution that took me 10 minutes (in C) :

#define LIST_SIZE 10000000
#define SHOW_SIZE 5


typedef struct node {
  struct node* next;
  int   val;
} node; 


int main(int argc, char* argv[]) 
{
  int i;
  long start, end;
  node* head = malloc(sizeof(node));
  node* cur = head; node* cur2 = head;
  node* prev = NULL; node* next = NULL;

  // init linked list
  printf("         linked list : ");
  for (i=1; i <= LIST_SIZE; i++)
  {
    cur->val = i;
    if (i < SHOW_SIZE) printf("%d -> ", cur->val);
    if (i != LIST_SIZE) cur->next = malloc(sizeof(node));
    cur = cur->next;
  }
  cur = NULL;
  printf(" ... (%d more)n", LIST_SIZE - SHOW_SIZE);

  // reverse linked list
  start = clock();
  while (cur2 != NULL)
  {
    next = cur2->next;
    cur2->next = prev;
    prev = cur2;
    cur2 = next;
  }
  node* head2 = prev;
  end = clock();

  printf("reversed linked list : ");
  for (i=1; i <= LIST_SIZE; i++)
  {
    if (i < SHOW_SIZE) printf("%d -> ", head2->val);
    head2 = head2->next;
  }
  printf(" ... (%d more)n", LIST_SIZE - SHOW_SIZE);

  printf("Elapsed time: %.3lf msecsn", (double)(end - start));
}

The code is fairly straight-forward for anyone familiar with C. Think about it though, when someone is under pressure to answer this question and they are coding this up on paper for the interviewer there are about 50 lines of code in which the interviewee can go wrong.
This is the imperative paradigm that most programmers live with on a daily basis and which they need to prove they can deal with in many interviews every day around the world. To make it worse this is also the only paradigm that most interviewers really know.

Now the equivalent in a functional language is a one-liner (in Clojure) :

(let [nums (range 1 1e6)] 
     (time (take 5 (reduce (fn [rlst item] (cons item rlst)) '() nums))))

So at the low-level of a developer’s daily life experience with imperative languages, how many programming bugs happen as a result ? What about lost time and the individual productivity ?
Moving up a level : what does this do to the team’s productivity ? What is the ‘new feature vs. fire drill’ ratio for this team that is forced to worship at the altar of imperative languages by academia and industry ?
Now on the macro level translate all this into costs in time and lost revenue for a company. What is the estimated percentage of lost revenue and profit due to being stuck with this paradigm ?
Middle-management’s sales-pitchy MBA approach to software developer fungibility and hence their love of lowest common denominator languages costs companies untold losses in the millions. They have also been much of the cause for a generation of below-average programmers.
(The cancer that is the MBA club in the world of software development is a topic for another essay).

Language matters at all levels.

Convex Hull with Aki-Toussaint heuristics

;; Calculates the convex hull of a set of points using the Graham scan
;; algorithm.

(ns i27.geom.convex-hull
  (:use [i27.geom.misc :only (angle cross-product)]))


(defn presort-points [pts]
  "Presorts the cartesian points in descending order by angle.
   The point with lowest y value is last in the vector."
   (let [pts (distinct pts)] ;; must be distinct or else errors ...
    (letfn [(find-start [[x1 y1 :as pt1] [x2 y2 :as pt2]]
                (cond (< y1 y2) pt1
                  (= y1 y2) (if (< x1 x2) pt1 pt2)
                  true pt2))]
           (let [pt (reduce find-start pts)
             npts (remove (fn [[x y :as cpt]]
                      (and (= (first pt) x) (= (last pt) y))) pts)]
          (conj (apply vector
                   (sort (fn [pt1 pt2]
                     (> (angle pt pt1) (angle pt pt2))) npts))
            pt)))))


(defn convex-hull
  "Calculates the convex hull given a set of points in the cartesian plane.
   Uses the Graham scan algorithm. Performance is O(n lg n) time."
  [pts]
  (when (>= (count pts) 3)
    (let [start (apply vector
                       (reverse (subvec pts (- (count pts) 3) (count pts))))
          other (reverse (subvec pts 0 (- (count pts) 3)))]
      (letfn [(popstack [stk pt]
                        (if (not (> (cross-product
                                     (last (pop stk)) (last stk) pt) 0))
                          stk (recur (pop stk) pt)))
              (scan [res pt]
                    (conj (popstack res pt) pt))]
        (reduce scan start other)))))


(defn elim-points
  "Prepare data with Aki-Toussaint heuristics before passing to convex hull function.
   Performance is O(n) time regardless of the convex hull algorithm used."
  [pts]
  (when (>= (count pts) 4)
    (letfn [(find-quad
             [[[x1 y1 :as pt1] [x3 y3 :as pt3]
               [x2 y2 :as pt2] [x4 y4 :as pt4] :as quad] [xc yc :as cpt]]
             [(if (> x1 xc) cpt pt1) (if (> y3 yc) cpt pt3)
              (if (< x2 xc) cpt pt2) (if (< y4 yc) cpt pt4)])]
      (let [poly (distinct (reduce find-quad (take 4 pts) pts))
            ;; generate the line segments that form the convex polygon
            lines (conj (reduce (fn [lines pt] (conj lines [(last (last lines)) pt]))
                                [[(first poly) (second poly)]] (rest (rest poly)))
                        [(last poly) (first poly)])
            ;; filter out all points that fall inside the convex polygon
            pts (filter
                 (fn [tpt]
                   (not (reduce (fn [b v] (and b (neg? v))) true
                                (map (fn [pt]
                                       (cross-product tpt (first pt) (last pt)))
                                     lines))))
                 pts)]
        pts))))

Timing without heuristics :

java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

generating 1000000 random points ...
"Elapsed time: 896.39526 msecs"
sorting 1000000 points ...
"Elapsed time: 52142.589419 msecs"
calculating convex hull for 1000000 points ...
"Elapsed time: 5370.579308 msecs"
[[246336 0] [516315 0] [332671 0] [920929 0] [992271 1]] - 35 points
"Elapsed time: 57547.947404 msecs"

Timing with heuristics :

java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

generating 1000000 random points ...
"Elapsed time: 897.92648 msecs"
heuristic elimination of points ...
"Elapsed time: 264.116189 msecs"
sorting 510685 points ...
"Elapsed time: 25988.023392 msecs"
calculating convex hull for 510685 points ...
"Elapsed time: 3159.032778 msecs"
[[246336 0] [516315 0] [332671 0] [920929 0] [992271 1]] - 35 points
"Elapsed time: 32544.642604 msecs"

The code.