November 2009 Blog Posts

David Wheeler (1927-2004) was something of a computer science folk hero around Cambridge.  You think object-orientation was a neat idea?  He invented the subroutine.  Academics would tell stories of having conversations in which they'd describe some particularly thorny problem they were trying to solve, only to be met with, "Oh yes, I remember that:  I never got around to writing a paper, but I’ll give you my notes on how I dealt with it.”

But he's most famous for saying "Any problem in computer science can be solved with another layer of indirection.".  You'd have thought inventing functions would be more memorable, but actually it turns out that pithy remarks win out where posterity is concerned.

SOLID is all about abstraction.  Let me express them in those terms:

  • Open/Closed:  Make stable abstractions
  • Liskov:  Don't break your abstractions
  • Dependency Inversion:  Don't take dependencies on concretions, including on object creation
  • Interface Segregation:  Make your abstractions as small as possible

Now, abstraction is useful precisely because it introduces indirection.   Single Responsibility isn’t explicitly about abstraction, but applying it will introduce more indirection into your code.  It’s exactly this that delivers the benefits of SOLID.

But that will usually create another problem

Of course, Prof. Wheeler didn't end the quote there.  Someone misquoted it as “Any problem in computer science can be solved with another layer of abstraction, except too many layers of abstraction.”  Multiple layers of indirection can be quite painful to navigate, as anyone who’s stared as Castle Windsor’s code for the first time will attest.*  This is a huge barrier to acceptance.  I’m proud to say that I’ve managed to convince a number of developers of the utility of SOLID, but the fact remains that many just see it as unnecessary complexity.  One complaint that I often encounter basically comes down to “F12 doesn’t work.”

Now, someone smarter and funnier than I am described not using dependency inversion as like soldering a lamp directly into the wall.  This sounds ridiculous but let’s talk a bit about the benefits of soldering it.  Well, figuring out your wiring is really easy.  It never changes** and tracing your circuits is really easy.  If you’re dealing with the plug next to the sofa, it’s got a reading lamp in it.  It’s never a phone charger.  When you switch on the reading lamp, you’re guaranteed it’s going to work and you’re not going to have to mess around unplugging whatever the last guy was using.  SOLID code, by contrast, looks like tangle of wires the first time you see it.

Tracing the circuit in old-school code is pressing F12 in Visual Studio.  SOLID breaks F12.  You know the old saying that when all you’ve got is a hammer everything looks like a nail?  Well, imagine you’ve got a screw and, instead of being able to whack it in, your hammer didn’t work at all.  You might well come to the conclusion that there was something seriously wrong with this funny-looking nail.  If you don’t have a key press for “go to implementation of method” like in ReSharper, you’re going to find that the only way to trace the execution flow is with a debugger.  Equally, if “Find Usages” doesn’t understand that a method may be called because it’s an implementation of an interface, you’re going to find SOLID code harder to navigate.  Never mind the cost of writing all of those single responsibility objects when you can’t press “Alt-Enter Enter” for the obvious bits.  And I’ve only just scratched the surface.

Yes, you can do SOLID in Notepad and No, this isn’t the only barrier to adoption, or even the biggest.  Still, it’s well worth bearing in mind that tooling matters.  I kid you not, I’ve seen code get rewritten from SOLID style to something that works better with F12.  If you want to get people into proper object-orientated design, you could do worse than starting with getting ReSharper on their desks.

[Disclaimer: I have no financial or personal interest in JetBrains; they’ve never even offered me free stuff.]

*Or StructureMap, or NHibernate, or pretty much any well written open source project.

**at least, not until requirements change, but that’s the point...

I'll be honest, the main reason for the last post was to make sense of this one.  Consider the following code:

public User UserByName(string name)
    return session.Linq<Person>()
        .Where(u => u.WindowsUserName == name)

It's exactly what you'd hope for.  Nice and explicit: you want the first person whose windows user name matches 'name' and to pull back the person's roles at the same time.  Because you understand how eager fetching works, you've slapped a distinct root in there too.

Pity the code's wrong.  The gotcha is the way that NHibernate.Linq interprets FirstOrDefault.  To be clear, this isn't a bug, it's definitely the right behaviour.  FirstOrDefault translates to a "top 1" in SQL Server (or a Limit in others).  DistinctRootEntityResultTransformer works after the query has run.

So, you will get at most one Person object back, but you'll also get at most one Role back, which would undoubtedly lead to problems elsewhere in your code.  Try writing an example program to demonstrate this and get it to print the SQL you run.

So, how do you deal with it?  Well, you need to stop FirstOrDefault getting translated into the SQL.  So we use my favourite LINQ defeater:  ToList. 

public User UserByName(string name)
    return session.Linq<Person>()
        .Where(u => u.WindowsUserName == name)

Now that code actually does what you wanted.  Of course, there's a catch: if there really were two people with the same windows user name, it would fetch them both.  But at least your code is now correct.

Technorati Tags: ,

I use these a lot.

    public static class NHLinqHelper
        public static INHibernateQueryable<TValue> DistinctRoot<TValue>(this INHibernateQueryable<TValue> query) {
            query.QueryOptions.RegisterCustomAction(c => c.SetResultTransformer(new DistinctRootEntityResultTransformer()));
            return query;

        public static INHibernateQueryable<TValue> Cached<TValue>(this INHibernateQueryable<TValue> query) {
            return query;

This just makes the most common query options available fluently.  It isn't perfect, in that using standard LINQ operators changes the declared type, which means you need to set these at the start, not the end of the query.  But if you use NHLinq, you're used to that.

Technorati Tags:

OK, this isn't actually part of my SOLID Principles series (always a pity when your best content is a youtube link) but a response to Ryan's article on Los Techies.  I've not really got my head around the way that unit testing works in python, but the I get that absolutely everything is being overrideable on an instance or class basis affects the approach.

Let's talk about Ryan's list.  Now, he argues that Python offers alternatives for the interface segregation principle, open/closed principle, and the dependency inversion principle.  I'm going to argue that the principles are actually the same, even if the practice is different.  (Like Ryan's article, pretty much everything said here applies to any dynamic language, but I'll talk about Python.)

Python and The Interface Segregation Principle

Well, making everything an interface might seem like a valid solution to the Interface Segregation Principle but it's a bit weird.  As I mentioned in my original article, the whole point is the "fine-grained" part.  With Python, the interface that a client consumes is exactly the methods it calls.  In that respect, all Python code respects ISP by default. 

The potential interface surface is fundamentally flexible.  Arguably that's a problem for ISP:  You can always just call another method if you want to.  I don't honestly think it matters though.

Ultimately, I don't think ISP is changed by Python, it's just kind of irrelevant, for better or worse.

Python and The Open Closed Principle

Well, the open closed principle is a goal, not a design practice, but let's take a look at the danger points:

  • You can't have non-virtual methods, so Python wins this hands down.
  • Your variable can't be made too specific, so you're safe there.
  • You can still compare against hard-coded values.  It's just as easy to get this wrong in Python as it is in C#.
  • Same holds true for Law of Demeter violations.  If you pass the wrong object around, your code will be just as fragile in Python as in C#.

Python certainly reduces the scope for some OC violations, but you've still got lots of rope to hang yourself.  Think you still need to bear the goal in mind.

Python and The Dependency Inversion Principle

Python doesn't provide an alternative to the dependency inversion principle, it just looks like it.  Now, DI isn't about using an IOC container (which is a slightly crazy/painful thing to do without a static type system), it's about decoupling.  Now, in Python you can override any function, including a class function (which a C# developer would describe as a static method) so everything's alright.

Except it isn't.

Let me give an example.  You go around your house welding lamps directly into the plug sockets.  This is equivalent to calling Lamp() directly from within the Socket's code.  Now, let's assume you wanted to change one Lamp to an XBox.  Well, you can always monkey patch the method so that it behaves like one.  Ugly, but possible.  Let's try something harder: change every Lamp to an XBox.  Not sure why you'd want to, but it's your house: you can just change the class to behave like an XBox.  Great.

Until your neighbour comes round and asks why all of his lamps just turned into XBoxes.

Let's quote Ryan:

Now all calls in this runtime, from any module, that reference the Output class will use XmlOutput or HtmlOutput instead.

Yes, but what if I wanted only half of them?  Maybe there's Python techniques I don't know about (I'm barely competent in the language) but as I see it, I'm going to need to change the code.  I don't think that dependencies can "always" be injected.  They can only be done when it won't cause damage.  In his case, he's worrying about testability.  That's fine, but we all agree there's more to DI than testability.  You will definitely have fewer obvious problems, but if you don't pass things in using constructors and use abstract factories, you will still run into code fragility, even in a flexible language.

Python still Needs SOLID

None of this is to disparage Python.  It's a cracking language with a great deal of flexibility and extremely productive.  But it's not the Holy Grail.  It's still perfectly possible to violate OCP and it still can find DI useful.

The interface segregation principle is regarded as the least important of the SOLID principles.  I think this is a matter of context.  If you're implementing IGame and don't need PreviousMoves functionality, you could always just throw a NotImplementedException and not worry about it.  Sure, you've violated Liskov quite badly by doing so, but not in the contexts that you actually care about.  The problems will start to develop as the code morphs and your broken abstractions start to matter.  It won't break you half as fast as not using an abstraction in the first place, but it will matter eventually.

Things get more interesting when we start talking SOA.  Now, the requirements of SOA are actually exactly the same in this context as ordinary code, it's just that interfaces, once published are often set in stone.  This makes it much more important to pay attention to the requirements of the client.  The "client" is often a business process.  So, for instance, take a equities trading system.  The way an order looks to a trader is very different from the way it looks to the guy trying to settle it three days later.  The guy trying to report these trades for compliance purposes has another view, and the guy trying to value them for his risk analysis has another.  Interface segregation says that you shouldn't be passing around the same interface to all of those people.  You might still be thinking these should all be facades onto the same object, and they could be, but it's not necessarily the case.  These could be completely separate systems only connected by a messaging interface.  So, I'll finish up with my own corollary of the Interface Segregation Principle:

Unified Models are neither sensible nor desirable.

The interesting thing about the problems we encountered with Liskov is that they lead directly to the next principle.  The basic problem we found was that there are a lot of implicit assumptions that can and often are made by the usage of an interface.  We also discussed the use of contracts to make some of those assumption explicit, and the use of unit tests to help verify those assumptions.  However,  we ignored the simplest solution of the lot: make fewer assumptions.  That's what the Interface Segregation Principle is about.

Let's go back to my naive understanding of object-orientation.  By now, I'd learned what the interface keyword did, so when I created a ChessGame class, I knew that I needed an IChessGame interface.  I was still wrong.  Let's think about it for a second: imagine I write a tournament class which plays a certain number of games and returns the winner.  There's nothing chess-specific here.  By using IChessGame, I'm still requiring anyone using this functionality to implement chess.  Which is a pity, because when you look at it, a draughts tournament works in exactly the same way. 

Get to the principle already

Here's the basic statement of the principle:

Make fine grained interfaces that are client specific.

Read that last bit again "client specific".  Let's say that I look at my previous code and say

  • Well, ChessGame inherits from Game
  • Game implements IGame
  • So I'll just change my code to use IGame.

Well, I've satisfied dependency inversion there, but I've completely missed the point when it comes to interface segregation.  Let's talk a look at the IGame interface:

public interface IGame {
    IPlayer PlayGame();  // returns the winner
    IEnumerable<IMove> PreviousMoves { get; }

The tournament doesn't need to know about PreviousMoves.  He actually wants a smaller interface: one that just plays the game and returns the winner.  Call it ITournamentGame if you like.  Does the Game class or the ChessGame class implement this interface?  Doesn't matter.  What matters is that we've reduced the coupling of our code.

Okay, this one is a bit easier to express:

If you declare something as taking a type, any instance of that type should be usable there.

Or, to put it another way, a Stock object is always a Stock object, it's never a cow.  The calculateValue() method should always calculate the value,  never fire a nuclear missile.  This is the Liskov Substitution principle, and is basically an injunction against creating objects that pretend to be one thing for convenience, when they're actually something else.

There's a very easy way to violate Liskov without noticing: check the type of an instance.  Nearly always, if you've got an interface IPerson and you use "typeof" or "is", you've written code that branches, usually an if statement.  Now take a look at that statement again, and consider what happens when someone writes a new implementation of IPerson.  Which side of the if statement does it fall?  Answer is, it doesn't matter, the next implementation might want either side.  Yep, your code's gonna break. 

In this case, what's happened is that you've basically broken encapsulation.  If you move that decision into the implementing classes, either as a boolean property or a virtual method, you'll solve the problem.  (I'll add that a boolean property is going to prove a lot more fragile than the virtual method, but it's massively easier to achieve.)

The Bad News

Unfortunately from the Liskov Substitution Principle, it's completely impossible to achieve.  Every piece of code you ever write forms an implicit, stateful contract with its dependencies.  Even if you are fully Liskov compliant right now, the next function someone writes may contain an implicit assumption that's violated in a tiny proportion of cases.  Truth is, types are not a constraint system and trying to pretend like they are can be positively dangerous.

Bertrand Meyer understood this problem and created Eiffel.  Some of those ideas will make it into C#4.  James Gosling understood the problem, but for some reason thought that constraining thrown exceptions was the best solution.  The problem with Java exceptions actually helps us understand the problem with a slavish adherence to Liskov: premature constraints.  The Java exception paradigm expects the interface designer to be able to anticipate all possible implementations of the interface, and punishes the implementor when the designer got it wrong.

A Sensible Approach

Well, design by contract is coming soon and will definitely enable us to improve our code quality, but what can we do about this now?  First, there's just the basic "use common sense" directive: don't wilfully violate the behaviour that you'd expect of an implementation of an interface.  Sometimes it's unavoidable: a read write interface with an asynchronous implementation could behave quite differently from the synchronous implementation, and for good and valid reasons.  What you can do is to implement standard unit tests for implementations of an interface.  Here's how you do it:

  • Create an abstract test class with a method GetImplementation
  • Make all of the tests use the interface
  • Create multiple classes all of which override GetImplementation

Obviously, this creates a lot of tests, but it's probably the best way to specify expected behaviour right now.

Finally, you owe it yourself to take time out and remind yourself that L stands for many other things too.

Technorati Tags:

I don't tend to post things because they're funny, but this one was quite special.  I was interviewing someone recently and saw that they were using Entity Framework.  Now, I know very little about it, since I've expended my energy on learning NHibernate.  So, I asked "You're using Entity Framework.  What do you think of it?".  The reply I got:

Actually, that's one of the reasons I'm leaving my current employer.

I don't laugh out loud in interviews very often...

Technorati Tags: ,

This is probably the biggest problem most people have with understanding indexes.  How exactly does SQL Server decide when to use an index?  There's two big errors that people make here: assuming it'll always "just work" (it doesn't) or that they should just force it to use the indexes they think it should be using (it's nearly always slower if you do this).  Instead, it's best to understand the selection process and see if you can structure your query in such a way that it accesses the indexes you want.

First things first.  It's never going to use two indexes on the same table access.  If you use a table twice in the same query, it might use different indexes, but otherwise you should only ever see one index.  Next, and this is the really important bit: column order matters.  If you've got an index on three columns and your query uses two columns but not the first column, it won't use the index.  (If you use the first and third, it might use it, but it'll score it the same way as just using the first.)

Other considerations:

  • As discussed previously, it might decide to use index covering.  If the columns match up in order as well, so much the better.
  • If statistics are out of date, it can get its decisions wrong.
  • If you join two columns and they're not of the same type, it can make the wrong decision.  I've seen this happen when the only difference is nullability, but not in recent versions.
  • Inequalities aren't as selective as equalities.  Typically, given the choice between an index and a greater than and an index with an equals, it'll choose the equals.  Again, this is driven by the statistics.
  • If the table is very small, it might be faster to do a table scan.
  • In very rare circumstances, it'll build an index and then query it.  This almost never happens; you usually have to break up your query and create temporary tables to achieve this effect.

But the most important thing to remember is: it will only score the index based on the first columns in the index that contiguously match your query.  Miss out one column and it'll just ignore any subsequent ones.  Miss out the first column and it'll probably never use your query except for index covering.

Technorati Tags: ,

My previous article emphasized that you shouldn't mix clustered indexes and identity fields.  However, if you're using NHibernate you already probably know you shouldn't be using identity fields.  On the other hand, the points still generalize to some of the NHibernate generators:

  • Increment: just as bad as identity
  • HiLo:  better than identity, but not by much.  Don't mix with a clustered index.
  • GUID:  Extremely random.  In fact, probably too random.  The inserts get plastered everywhere and can hurt performance.
  • GUID Comb: Better than GUID,

In short: still don't put a clustered index on an identity field.  Modifying GUID comb to include the thread ID might actually make it viable.

Again, we're talking OLTP here.  If you read the original article, Jimmy Nilsson measures batch insert performance, and the concurrency implications never come up.