Why I Don't Regard Algorithm Proof as Useful

Friday, August 21, 2009 12:33 AM

Okay, now that I've swapped insults with Frans (in fairness, I started it) I think I should explain my position on provability more carefully.  Some context: I'm a mathematician by background, I came to computing relatively late, and am extremely aware of the gap between the two disciplines. 

Let's be clear, I actually agree with a fair amount of what Frans said in the original article.  I'm certainly not advocating a plough-in "RAD" approach to development, I just think that the concept of proof in computing isn't particularly useful. 

The Importance of Algorithms

Let me start with sorting.  There are a number of obvious known algorithms for sorting on one thread:

  • Bubble sort
  • Quick sort
  • Merge sort

Of these, one is stupid, one has the wrong complexity but low memory usage and one is fast but uses more memory.  Which one should I implement?

Well, actually, usually the answer is: I shouldn't bother.  I'll just type "orderby person.Name" and get on with my life.  I've got a bunch of algorithmic knowledge in my head and it's about as useful as my encyclopedic knowledge of Babylon 5.  Frameworks like .NET and, frankly, LLBLGen have been implementing high quality algorithms for most use cases, which means that most of the time, I don't need to bother.  Sure, there's very few general frameworks for graph traversal, but that's mostly because it's of specialized utility.

I used to work at a dot-com with over 50 developers.  Every so often, something somewhat mathematical came across our desks.  When this happened, it got passed to me.  I estimate that I did something like this for about two days in every three months.  The truth is, there are many environments in which this stuff doesn't come up very often.

The Problem of Proving Anything

Let's first deal with a philosophical point: what do you mean by proof?  As a former mathmo, I have a pretty strong idea of what that means.  It means 100%.  Sadly, C# isn't a proven language, and Windows isn't a proven operating system.  In terms of formal specification and verification, hardware is ahead of the game, but your average processor is chock-full of errata.  In general terms, we've no good model for proving imperative style code, even if there have been recent breakthroughs.

This sounds all very academic, but it has some real consequences.  One of the principal ones is that, even if you prove something, proving that your implementation matches your conceptual model isn't really possible.  There's a huge gap between your pen and paper and the keyboard.

Now, previously I talked about how algorithms were a relatively small part of the average development workload.  Problem is, algorithms are actually the things that are easiest to deal with using formal proof techniques, because they're small and they're abstract.  Your average workflow is harder.

This is all assuming you have formal training in the discipline that this involves.  Most people don't.  Your proof is only as good as your confidence that you haven't made a mistake.

What Are You Proving?

I referred to proof before as specification and verification.  So far I've talked about verification, the part that most people regard as proof, but specification is just as important.  Let me tell another abstract story: Roger Needham was pretty much the world authority on authentication.  He pioneered third-party authentication protocols and helped develop the world's first authentication logic, a system for verifying authentication protocols.  You might think that a paper such as that would be the last word on the problem.  The problem is in the specification.  All the logic deals with is verifying that specific types of attack are not possible.  It didn't necessarily protect against guessing attacks on poorly chosen secrets, which is unfortunate since pretty much all passwords are poorly chosen. 

The problem here is specification.  Your employer or your client is only interested in "Is it secure?" or even more nebulous concepts.  The only way to formally specify "secure" is to restrict your attention.  Sometimes things slip through the gaps.  Famously, the initial authentication logic failed to spot a password sent in plain text.  Needham always argued that this wasn't a problem with the logic because it was a violation of the principles, but even that argument should illustrate the practical limitations of the techniques of formal proof.

Again, this isn't a an academic objection.  About six months ago, I put live an extremely sensitive piece of code.  It dealt with something that was actually appropriate to this kind of analysis.  I and several others spent a lot of time testing it, trying to poke holes in it.  There's a raft of automated tests for it.  We even ran it in parallel to the live systems for months to check that it behaved.

It failed a month after it went live.  Why?  Because we missed something.  The system worked perfectly, but it was only part of a larger system.  That larger system had complexities we hadn't captured that caused it to fail.  In short, we didn't ask the right questions.

Unattainable Goals

So, what's the alternative?  Well, I'd still favour TDD.  I know Frans wasn't arguing against this, but I think that talking about provable algorithms is a distraction.  Your average developer, when being told that he needs to prove things, won't outline the arguments that I've just given, but he'll know in his gut:

  • It doesn't seem very relevant to what he's doing.
  • It sounds impossible
  • It doesn't sound like what he does for a living

On the other hand, approaching things in a story-based fashion, where cussed exceptional cases are built into the test framework, that's something he can see is working. 

Your average developer wouldn't be able to read the phrase transitive closure without reaching for a dictionary (or wikipedia).  What he found there would be unlikely to enlighten him.  In fact, most developers don't really ever analyze complexity.  I used to regard that as shocking.  These days, I've come to the conclusion that it rarely makes them less productive.  Complexity does occasionally come up and bite them, but it's rare. 

I'm not arguing against learning, or against continuous improvement.  But you've got to have priorities and there are a phenomenal number of things to learn.  For instance, if you don't know anything about graph algorithms or the SOLID principles, I'd say you were better off studying and practicing the latter.


Frameworks are different, they're useful precisely because they do things that the average developer would find at best time-consuming and at worst error-prone or impossible.  They're the exception to the build vs buy rule: you're building so that someone else can buy.  For these reasons, I don't necessarily think the best practices for an ORM developer are necessarily the same as those for a LoB developer.  In short:

  • I don't think algorithms are that important in LoB development.  Those that are common are already written.
  • Formal verification isn't a error-free as is first appears.
  • You can only verify what you specify.  Errors in specification can and do happen.
  • Finally, for the reasons above, I think that the concept of proving your algorithms is of niche utility.  TDD is the tool I'd reach for first, not formal proof.


Technorati Tags: ,,
No comments posted yet.
Something to add?

Talking sense? Talking rubbish? Something I'm missing? Let me know!

Fields denoted with a "*" are required.

 (will not be displayed)

Please add 7 and 4 and type the answer here:

Preview Your Comment