O-notation considered harmful (use Analytic Combinatorics instead)

From Robert Sedgwick's lecture "Analytic Combinatorics, Part I - A Scientific Approach".https://class.coursera.org/introACpartI-001/lecture/index

From Robert Sedgewick’s lecture “Analytic Combinatorics, Part I – A Scientific Approach”.
https://class.coursera.org/introACpartI-001/lecture/index

For so long I’ve been skeptical about the classic approach of the “Theory of Algorithms” and its misuse and misunderstanding by many software engineers and programmers. Big O notation, the Big Theta Θ and Big Omega Ω notations are often not useful for comparing the performance of algorithms in practice. They are often not even useful for classifying algorithms. They are useful for determining theoretical limits of an algorithms’ performance. In other words, their theoretical lower bound, upper bound or both.
I’ve had to painfully and carefully argue this point a few times as an interviewee and many times as part of a team of engineers. In the first case it can mean the difference between impressing the interviewer or missing out on a great career opportunity due simply to ignorance and/or incorrigibility of the person interviewing you. In the latter it could mean wasted months or even years in implementation effort and/or a financial failure in the worst case.

In practice the O-notation approach to algorithmic analysis can often be quite misleading. Quick Sort vs. Merge Sort is a great example. Quick Sort is classified as time quadratic O(n²) and Merge Sort as time log-linear O(n log n) according to O-notation. In practice however, Quick Sort often performs twice as fast as Merge Sort and is also far more space efficient. As many folks know this has to do with the typical inputs of these algorithms in practice. Most engineers I know would still argue that Merge Sort is a better solution and apparently Robert has had the same argumentative response even though he is an expert in the field. In the lecture he kindly says the following : “… Such people usually don’t program much and shouldn’t be recommending what practitioners do”.

There are many more numerous examples of where practical application does not align with the use of O-notation. Also, detailed analysis of algorithmic performance just takes too long to be useful in practice most of the time. So what other options do we have ?

There is a better way. An emerging science called “Analytic Combinatorics” pioneered by Robert Sedgewick and the late Philippe Flajolet over the past 30 years with the first (and only) text appearing in 2009 called Analytic Combinatorics. This approach is based on the scientific method and provides an accurate and more efficient way to determine the performance of algorithms(and classify them correctly). It even makes it possible to reason about an algorithms’ performance based on real-world input. It also allows for the generation of random data for a particular structure or structures, among other benefits.

For an introduction by the same authors there is An Introduction to the Analysis of Algorithms(or the free PDF version)and Sedgewick’s video course. Just to make it clear how important this new approach is going to be to computer science (and other sciences), here’s what another CS pioneer has to say :

“[Sedgewick and Flajolet] are not only worldwide leaders of the field, they also are masters of exposition. I am sure that every serious computer scientist will find this book rewarding in many ways.” —From the Foreword by Donald E. Knuth

 

16 thoughts on “O-notation considered harmful (use Analytic Combinatorics instead)

  1. Great Post!

    Asymptotically, in the worst case, Merge Sort may be faster than Quick Sort; but how often do we approach the asymptotes?

    I started having this doubt when they taught about notations in college.

  2. Take the Quick vs Merge sorts example, understanding the differences (in an exact nature), and reasoning about these difference (you need the tools), coupled together with questions that expose the mechanical in\efficiencies (eg: when is quick sort better than bubble sort?) of the implementations, allows one to make the leaps on intuition that lead to things like: Introsort.

    So don’t pooh-pooh such things…. It is true that the various complexity notations are initially difficult to get one’s head around (especially if you don’t come from a deep math/functional analysis background), but they are good yard sticks for reasoning and direction purposes.

    That said nothing beats empirical analysis, but any tool that reduces the effort is worth learning.

  3. So many grammatical errors… my eyes…

    TL;DR: Sedgewick has a new book out. (You can get it for free here: http://algo.inria.fr/flajolet/Publications/book.pdf)
    OP has apparently failed numerous job interviews due to erroneously claiming “Quicksort is better than mergesort/heapsort/introsort,” and refusing to acknowledge asymptotic performance considerations. OP takes Sedgewick’s book as validation of his position. OP is still wrong.

  4. Quick sort is often used over merge sort not because there’s some magic gap between theory and practice that somehow makes an O(n^2) algorithm faster than an O(n log n) algorithm, but because we can easily show that quick sort runs in O(n log n) time for most inputs, and that randomised quick sort runs in expect O(n log n) time with high probability, and quick sort usually has a better constant factor on the machines we use.

    A quick sort with pivot selection that can be easily exploited to make the algorithm run in O(n^2) is quite dangerous, and has been used as a DOS attack vector in the past.

    You’re missing the point of this type of analysis entirely, and being extremely condescending while doing so.

    • Unless the majority of the data is already sorted and it’s just a matter of sorting the new entries into the pre-existing structure. Then insertion sort works a lot faster than quick sort.

      Try it for yourself: http://www.sorting-algorithms.com/

      Case in point, algorithm complexity analysis is heavily context sensitive, regardless of what established theory claims.

      I’m currently taking Robert’s Algorithms 1 course on Coursera but I’m looking forward to seeing what he has to say about combinatorics once I work my way up to it.

  5. Pingback: The big-O notation is a teaching tool

  6. That paper seems much harder to digest than big O notation ever could be, whether it’s more accurate or not. O notation estimates can be computed in your head, off of pseudocode, and out of actual code in a lot of cases. If it’s going to be any more complex than that, then it’s going to need tool support. (ie: Simulate the big architecture diagram. The atomic blocks in the block diagrams are often stupidly complex in implementation, where an overly detailed performance analysis would swing all over the place as the code changes.)

    That poster that says that in the end, “it’s scalability that matters” has it right though. You can probably safely assume a reasonable upper-bound on available memory in a single machine. But web-scale businesses just continue to add hard-drives to many thousands of nodes and basically never delete anything, with terabytes of data coming in every single day. It’s probably the power bill that limits the amount of disk space.

  7. Hi there, AC has nothing to do with computing expected performance – in the video, Sedgwick explicitly states that O is dangerous because it hides the coefficient of the dominant term. He does not say that O is dangerous because it can only be used to bound the worst-case; in fact, at no point in AC nor AoA does he talk about expected behavior of algorithms. Therefore, I think you may be misleading your readers by contending that a valid counter-example of how O is dangerous is because under uniform distribution of input, quicksort runs in expected time of O(n\log(n)) (or amortized) but has worst case behavior of O(n^2), hence O is a misleading tool. Both are valid bounds on the algorithm under different use cases, and O has zero relevance in your argument.

    I think what you are really contending is that focusing on worst-case behavior is dangerous. This is true, but even under this perspective, you are going to compute an upper-bound on the expected behavior and attempt to bound it as tightly as possible, so even amortization requires O or some bounding operator. Sedgwick merely argues that the bounding/equivalence relation we should use is (f ~ g iff limsup f/g = 1) rather than (f = O(g) iff limsup f/g is bounded).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>