July 2009 Blog Posts

Another thing that we covered at my recent presentation was common use cases of Singletons.  The problem with artificial examples is that you're often presented with objections that the point you're making is restricted to the particular example you've given.  So, instead, let's take a look at some of the most common usages of singleton patterns and explain why you're breaking your code to use them:

The Current User

I've lost count of the number of times I've seen the current logged-in user accessed through a static method.  It's extremely tempting: there's a phenomenal number of parts of your system that rely on knowing it, and you don't really want to keep passing the user around everywhere you go.

The problems start when you look at the implementation.  Chances are, if you're writing a web application, you're pulling that information out of the HttpContext (or worse, using a component that you don't know uses the HttpContext).  The first time you try to add in some batch processing in a console application, you're going to discover one of your main dependencies just fell away and you don't have a backup.  You could try simulating HttpContext.Current, and you might even succeed (I doubt you'll enjoy it) but you'll now have a console app that pretends to be a web application just to support some code that you're already starting to think of as legacy.

However, even if you could deal with this, you've got other problems.  What happens if you wanted to be able to impersonate another user?  It might not sound like much of a concern right now, but wait till you're supporting the application and want to see what the other guy can see.  Sure, you could deal with this by hacking around with the static methods, but it would have been a whole lot easier if you'd just passed the correct routines what user you wanted.

Another concern: sometimes a workflow has to go through multiple users, an approval process is an obvious example.  Do you really want user A to ask user B to do something and then discover user B doesn't have the permission?  Much better to be able to pass a user other than the current user into the permission logic and let user A be able to see who exactly can help him.  All of a sudden, your permission system doesn't rely on the current user at all.

You might be thinking that all of this could be avoided by designing all of these features in at the start, but you're going heavily into waterfall thinking there, a methodology that has been comprehensively found wanting.  Better is to start with a flexible design that allows you to change behaviour when you need to, and for that you need to be passing around instances, not calling static methods.

The Current Time

If you've got DateTime.Now in your code, there's a good chance you've got a bug nearby.  Think about what happens when Daylight Savings Time kicks in.  However, even if you've changed your code to read DateTime.UtcNow, you're still going to have all sorts of problems.  Here's a concrete example: I have a order processing system with a batch job for Day End.  It uses the current time all over the place, including the truly basic task of working out what day it needs to be processing.

So, what if I want to run the batch on specific data?  On a day anything other than today?  Well, I'm going to have to change the system clock.  You thought it was bad trying to fake an HttpContext, faking the current time is much, much worse.  You're actually messing with the BIOS.  All because you didn't write the following code:

private interface IDateProvider {
    DateTime Now { get;  }

You can go significantly further down this road: I have unit tests in one project of mine that explicitly tests whether or not the code behaves correctly in Jersey City.  I wouldn't be able to do that if I hadn't abstracted out the concept of time from the system.


It's amazing how many people approach logging through a singleton approach.  If you're using a framework, you might not care about this:  there are people who've pretty much written the last word on logging and all you need to do is write a configuration file.  The very completeness of the solutions can lead you to thinking that static methods are the right solution for the problem.  Actually, they're not.  The completeness of the solution has merely minimized the damage.

In our firm, we've got a set of static methods used to write to the event log.  Now, unlike using something like log4net, this isn't a massively configurable and complete solution, it's just what the developers at the time wrote.  So, you can't affect policy: you can't, for instance, filter out logging on any level other than changing the logging library.  You can't disable logging if you're running tests.  In practice, you can't do very much at all.  Contrast this with the solution in one of our fix feeds.  It uses log4net, but more than that, it uses Castle's logging facility.  Here, we get a logging object passed into the constructor of the class that needs logging.  This logger is specific to the class that is using it: try implementing that using static methods.  (You can: you just need to either pass the caller into every last call or walk the stack trace.  Neither solution is desirable.)

Again, the singleton pattern actually makes things worse, not better, for logging.


Anyone whose dealt with .NETs config files for a while will have come across some standard problems:

  • The typeless nature of AppSettings is a pain in the neck.
  • Access to configuration settings can come from anywhere.
  • You've often got to include multiple different settings for the same value, to support different assembly's interpretation.

The second two are because ConfigurationManager is a singleton.

Let's see how this happens: ConfigurationManager exposes a public static method for AppSettings.  Anyone can use AppSettings however they like. and guess what?  They do.  Now, here's a common approach to dealing with this problem:

  • Only one class can access AppSettings.
  • Often, this class is a static class, so only has static accessors.
  • Any conversion to correct types is handled by this class.

Now, this is much better, and addresses the previous problems, but there's still some problems.  The first comes directly from the singleton nature of the solution.  For instance, if you've got a single database connection, it might not seem like that big a problem, but the day you have two instances and want to load data from one to the other, you'll discover that the fact your data access is hardwired to a particular configuration setting is causing you problems.  This is because you're still breaking the Single Responsibility Principle: your data access class shouldn't be making decisions about how to handle configuration, and accessing a static method on a configuration class isn't different from reading the configuration setting directly as far as dependency management.

Another problem is that having a "configuration" class doesn't really scale: the more assemblies you have, the less likely they are to want to share their configuration settings.  Now, you can always just simply have separate configuration settings for different concerns.  This is actually a good idea, but it still doesn't deal with what happens when you actually wanted to share configuration settings.

Truth is, most classes and even most assemblies, shouldn't need to know about configuration settings at all.  Configuration isn't like logging: logging decisions have to be in the class doing the work, configuration decisions don't.  Why not just pass the connection string into the constructor?  Same with the mail server, same with the retry policy.  Now, the only place that needs to read configuration is the Main method.

It's all Microsoft's Fault

Bill actually has a lot to answer for on this.  In each case we've deal with, the design of the .NET API has led us into long-term problems for a bit of short-term gain.  Developers often recognize that they've got a problem with HttpContext, the current time or configuration soup, but they don't know what to do about it.  The .NET framework has lots of these somewhat curious decisions: an API is developed for the lowest common denominator, with the understanding that more sophisticated developers will code around it.  At least it's usually possible to do so.  Static methods, where they have to be used, can usually be wrapped in instance methods with relatively few ill effects.  In the first case, a simple IIdentity interface which returns the name of the current logged on user can hide an awful lot of HttpContext-related problems.  But you do need to understand that this coupling happens, that it's damaging and how you can avoid it.

To re-iterate, singleton patterns are dangerous, even those that Microsoft have implemented.

Sun has always felt a need to educate its developers.  Sometimes this has led to Pearl Harbour scale disasters like J2EE, but it has also produced an extremely technically literate community.  On the other hand, sometimes you wish Microsoft didn't even try.  I've pretty much come to the conclusion, for instance, that the Microsoft Enterprise Library is the Wrong Thing.  Every so often, we come across stuff which falls under the heading of "mistakes you need a PhD to make", as Francis Fukuyama describes his advocacy of the invasion of Iraq.  The provider pattern is top of my list here.  If you're not familiar with this, it's a Microsoft-specific form of pluggable singleton.  It's a singleton by virtue the "default provider" mechanism.  It's extremely over-complex and, in my experience, just plain doesn't deliver any benefits that plain old using constructors wouldn't achieve better.

By combining the singleton pattern with a pluggable architecture, they hoped to draw the poison from the pattern.  Anyone who's used it will know this isn't the case. 

  • Sometimes the pluggability just plain fails: try find the parent node from a plugged sub-sitemap. 
  • Sometimes its insistence on using concrete types for everything makes your code nigh-on impossible to implement (especially if some third party made the same decision...)
  • Since it's a singleton and hence can have shared state, you need to be writing thread-safe code.  Not a trivial task for a neophyte developer who just wanted a bit of pluggability.
  • Since it doesn't have a coherent dependency injection model, you often end up using the Microsoft configuration model to get anything done.  (You do get a set of string name/value pairs, but any complex dependencies will fall down badly.)
  • Worst, when you finally discover that you actually wanted two of something, you get reminded that the provider pattern remains a special case of the singleton pattern.

It is in many ways really impressive, but that's what makes it especially pernicious: it picks up a lot of developers who are trying to improve and leads them down blind alleys.  You can spend a lot of time supporting a provider pattern.  When you start to figure out that it's not really paying back the investment, you're going to feel that much of this patterns stuff is just nonsense.  Tell me you don't know a developer like that...

Ironically, you know one really obvious user of the provider pattern?  Subtext, the blogging engine that powers, um, this site...

So, I gave the first of my talks about Design Patterns last week.  I concentrated my attention on the Command and Singleton patterns.  My colleagues weren't particularly interested in the Command pattern, but my remarks on the Singleton pattern raised a lot of interest.  For many, it was the first time they'd really heard someone come out and say that static methods were a bad idea.  It was ironic that Max Pool was blogging about the uselessness of evangelism on modern programming techniques while I was having a positive experience doing exactly that.

The thing is, everyone is used to a certain way of doing things.  They know that using static methods and shared state take time.  They know that they always end up with dependency soup, but it's usually thought that this is just what programs are like.  To a certain extent, it's always going to be hard to eliminate externalities, but it's a lot easier than most people expect.

Constructor Injection, equally, is really easy to explain: you just pass things into the constructor.  Developers who write a lot of tests can instantly see the advantages of doing things that way.

None of this makes it easy to be the guy in the room saying the exact opposite of what most people expect, but it's very rewarding when it comes off.

Here's some talking points:

  • Business requirements change, they usually change in a way you're not expecting.
  • If you only need one instance, creating it in the Main method and passing it into the objects that need it is much more flexible than using a Singleton pattern.
  • If something is public, it will get used, you can't create a static method and then say that people shouldn't use it.  They will, and it'll be your fault.
  • Constructor Injection is a very low cost thing to implement when you're writing new code. 
  • Refactoring old code to use it is much harder, but that reflects the refactoring challenges inherent in Singleton-style code.
  • If you're passing a lot of objects down a function chain, that's a code smell.  Chances are that the "group of objects" is a good candidate for a class.  Once you understand what that class is actually called, you're on your way to a better design.
  • Passing lots of objects in constructor chains isn't as easy to deal with.  Dependency injection containers make this problem manageable.  (Amongst other things...)
  • Evaluating all of your dependencies up front can lead to problems with circular dependencies.  Usually the best way to deal with this is to redesign objects so that they don't have circular dependencies, but property injection can help in ugly cases.
  • Ironically, developers often spend a lot of time trying to think of the best way to make object interact, how to load configuration settings and so on.  Constructor Injection makes this simple: it's always in the constructor.
  • Constructor Injection isn't quite the end of the story.  If you actually need more than one object, we need to start talking about abstract factories.

The sooner you start using constructor injection, the sooner refactoring your code will stop feeling like playing Jenga.

Technorati Tags: Singleton,Inversion of Control,Abstract Factory

I'm not massively fond of Castle's diagnostics.  There's certainly no general framework such as StructureMap has: you just ask for something and wait for the inevitable exceptions.  However, the guy who wrote this bit of code for PerWebRequestLifestyle will be bought a drink if I ever meet him:


It's amazing how often people will write FAQs explaining obscure error messages when sticking the diagnostics directly into the code would be more convenient, both for them, and the users.

Technorati Tags: Diagnostics,Castle Windsor

Okay, so a blog isn't the ideal format to dump 7000 words worth of thoughts on the subject of a book that weighs half a kilo.  :)

I used to write film reviews: each fortnight I'd knock out about ten 50 to 100 word reviews of movies on in the local area.  I learnt a basic rule of criticism: it's an awful lot easier to be nasty than nice.  I've had the same challenge here:  the better a pattern is, the harder I find to write about it.  Lord knows I find it easy enough to write about Singletons.

The dry and formal style of the Gamma et al disguises this problem: it attempts to be even handed the entire time, so the question of emphasis is moot.  Moreover, my opinion of what are the "best patterns" are, to a certain extent, coloured by my perception of which are "unobvious".  Thus, those that I find hardest to describe are exactly those I want to describe the most.  (This isn't strictly the case: I'd regard the visitor pattern as the hardest of the lot to describe.)

There's also the difficulty that some of the patterns are so similar that they've generated an entire sub-industry discussing them.  I wouldn't worry about this too much:  many patterns are useful only to the extent to which they clearly and succinctly describe a concept.

Table of Contents

Obviously, this being a blog, the structure of this changes with time.  Bits get revised and expanded upon.  This is a suggested reading order.  Patterns in brackets are not class Gang of Four patterns


Trash Talking?

A note about the last category: you could say I've trashed a third of the book, but that's not the case.  What I’m saying is that modern practice has moved on:

  • Iterator and Interpreter are pretty much built in these days
  • Observer and Mediator are component of Publish/Subscribe
  • Bridge / Template Method / Strategy are special cases of good practice.
  • Flyweight, again, is basically just a special case of a more general piece of good practice.

So, they’re not terms I use (although Rx means I’ve got to dust off observable…) but that doesn’t mean the concepts are bad.  The only true anti-patterns are Singleton and Prototype.  (Memento isn’t that useful, but if you’re in its use case space and can’t achieve the same effect with an event driven design, it’s a good option.)

Beyond The Gang of Four

Here's some patterns not listed that everyone should know:

  • Supervising Controller
  • Circuit Breaker
  • Publish / Subscribe (of course)
  • Dependency Injection, especially constructor injection. 
  • Unit of Work
  • Active Record

Maybe I'll write some stuff about those another time.  For now, I have a presentation to give...

Technorati Tags:

*I’m going to have to do some work on observer.  My argument that you can ignore it and skip straight to publish/subscribe was fine as long as the Rx framework didn’t exist.

NOTE:  This article gets updated in line with the blog.

This is the last pattern I'm writing up, and it's one of my favourites.  The chain of responsibility pattern is a brilliant way of dealing with special cases and functional complexity, and it does it in a way that allows you to gain the advantages of hard coding without the disadvantages of inflexibility.  Here's the basic idea: you have a list of Handlers.  Each has two operations:

  • Do I handle this message?
  • Process the message.

The original GoF statement of chain of responsibility implements the list as a linked list, hence the term "chain".  This has the advantage of allowing handlers to act upon a message and pass it on, but this is rarely used.

Returning to the example I gave for the visitor pattern, we could implement handlers for executions, allocations and orders.  The chain would run through the list until it found a handler that dealt with the message, it would then dispatch the message to that handler.  The great strength of the pattern is dealing with special cases.  Let's say that orders for one particular client always come with the price and the amount swapped round.  We can simply add an extremely special case handler that explicitly deals with this.

interface IRule<TValue, TResult> {
    bool Handles(TValue value);
    TResult Handle(TValue value);

The principal time that you know you want to use this pattern is when you see large amounts of special case code.  However, it's also a good way of dealing with excessive generality:

  • Great big XML files
  • Database Decision Tables
  • DSLs

If you find yourself in a situation in which one of these is being proposed, it's well worth examining whether or not a chain of responsibility would be better.  It can also be used in conjunction with these approaches.  For instance, one handler could be driven off a decision table, with a couple of special cases being handled by other handlers.  This can reduce the complexity of the decision table, so that it only deals with those cases which are easy to generalize.

I haven't spent a lot of time on this pattern, because my personal experience was that I appreciated how useful it could be as soon as I saw it.

This is actually about the state pattern. but let me talk first a bit about enums.  A big thing to think about is that "if" statements and especially "switch" statements can sometimes be a code smell.  Enums, in particular, can be a sign that something is wrong with your code.

Here's an exercise to see exactly why enums are a problem: take some code you've written (not somebody else's:  other people's code is always bad...) and identify an enum that you've defined.  An enum with a large number of values is particularly promising, but even tri-states can illustrate the problem.  Now search through your code and document how often you branch on the basis of the enum.  I'm betting it's pretty common.  Now, that in itself is a problem, but here's where it gets worse: I guarantee that those branches are all over your code.  Picture what happens if you add in another entry to your enum.  Think of how many unrelated lines of code you'd need to check.  To make it harder, imagine that the new value is very similar, but not quite identical, in behaviour to another one.  It's time, basically, to replace your enum concept with a proper class.

There's three cases that you need to think about when looking at an enum this way:

  • Does the enum basically represent an action?  In that case, introducing a command pattern may be the best idea.
  • Is the enum mostly used to branch an algorithm?  Here, a strategy pattern-style outsourcing of responsibility may be the best approach.
  • Is the enum a changeable property of an entity?  In that case, the state pattern is probably appropriate.

I'll observe that most of the arguments above also apply to boolean variables:  you need to be very careful with exposing state that is used for decision making in your code.  It can be very easy to end up with a mass of untracked dependencies.  In some ways, booleans can be even worse, because they're harder to refactor out.

Java Enums

I've touched on the idea that a lot of patterns address deficiencies in our programming languages, and this is a classic case.  Enums in C# and C++ are basically a fairly light syntactic wrapper around an integer.  Enums in Java are much better.  They're classes with a restricted number of possible values.  You can even overload methods for particular instances.  This makes implementing these patterns much more natural in Java than they are in C#.  I wish C# had Java's enums, but it seems unlikely that Anders is going to bother.  So, to implement these patterns in C#, you're going to need to do something like Kent's approach for representing enums as classes.

On the other hand, it's perfectly possible to introduce the same problems in Java code as in C# code if you don't understand what the problem actually is.  In particular, liberally using the enum literals will drag you down the same road, no matter how good your intentions. 

The State Pattern

Quite often you have objects with a status or state pattern, usually representing a stage in a workflow process.  The state pattern doesn't necessarily only apply to these cases, but it's quite a common use case.  More generally, if a property of an object has significant behavioural implications, it can be a candidate for the state pattern.  To emphasize this, let's take an example which isn't a workflow state:

Let's say we have a system for tracking changes to your live servers.  You've got three basic release types:

  • Routine maintenance
  • Ordinary, planned releases
  • Emergency work

Now, any given release might actually change which process it uses at any given time, complicating matters.  Consider if you want to know if a release is fully approved.  So, you represent the release type with a IReleaseType interface and add an IsFullyApproved method.  Here you've got a choice as to whether your release type object should take its parent object as state.  If it does, the interface is simpler since it doesn't keep having to ask for the parent object.  On the other hand, it's no longer got useful enum behaviours such as there only being one instance representing "Routine Maintenance".  You're not violating best practice either way.

You should definitely take a look at Davy Brion's implementation of the circuit breaker pattern, which uses a state pattern.  (Incidentally, if you're having trouble following the code, bear in mind that a circuit breaker prevents activity when it's open.  When it's closed, it passes electricity through.)

Enums as Classes

Although the State pattern is a useful and important pattern, it's part of a much larger approach: getting rid of enums and replacing them with classes.  We've seen that Command and Strategy can also be used to address similar concerns.  Don't get too hung up on whether it's a State, Strategy or Command: replacing a rather fragile enum with a proper object instance is the order of business.  Once you've made the decision, a lot of standard object design principles will come into play, making the design more obvious.  As a rule of thumb, try to avoid referring to an exact enum:  e.g. "State.Open" unless you're actually passing something to a constructor or writing a test.  The rest of the time, it should be the methods and properties of the "state" that are being used.

Some more things you need to deal with when using this pattern: what you do about persistence.  Basically, is your enum-like object an entity or just a value?  Both have their advantages.  if you want to go with a value, you'll need to deal with persistence concerns: implementing IUserType for NHibernate and implementing a model binder in ASP.NET MVC.

I recently had a rather incoherent rant about why Singleton is an anti-pattern.  Let's say that you decided that static methods needed to be eliminated from your code base.  So you embark on the refactoring to end all refactorings.  That's exactly the situation I'm in at the moment.  The irony is that the static methods I'm trying to eliminate are calls to StructureMap's ObjectFactory.GetInstance.  Yes, riddled through the code are calls to services.  Nothing is denied to any object. 

If you start trying to replace static methods with proper instances, you're going to be asking yourself a lot:

  • Why does that object need a connection to that system?
  • For crying out loud, why do I need to add an extra constructor parameter to support only one method?
  • Why exactly can't I just store the current user somewhere?
  • But now I've got X, which has a dependency on Y, and Y has a dependency on X, does this make any sense at all?
  • How many methods am I going to have change?
  • Does everything have to be an instance method?
  • Exactly how many levels do I need to pass this object down?
  • Why does this object take a dependency on every service?
  • How long is this going to take?
  • Is this really worth it?

The answer to the last question is yes

Static methods are like crack, they're convenient, they solve your problems quickly and you quickly become dependent upon them.  You're so dependent you don't even know you've got a problem.  You may even read an article like this and believe that it doesn't apply to you, because you've got your static methods "under control".

Now, if you've never done experienced a code base that's not riddled with static methods, you've never felt what it is to be clean, to be able to analyze your dependencies as easily as being able to examine your constructor, to be able to replace sections of your system and build new systems out of the components you've created without worrying about support functionality you're not actually using.  It feels good, but before that, it's going to feel really bad.  I don't mean really bad in the "non-technical manager who always opposes refactoring" bad, I mean bad as in "This was a huge mistake and I've no idea how I'm going to get it working again" bad.

What's worse, going cold turkey is something you have to do on your own.  Solving this one is hard even for other team members to help you.  Branching is going to help you this time; it's best to just have a code freeze.  But I figure it won't hurt to talk, so here's answers to the questions earlier:

  • Almost certainly, because your object design is wrong.  Just get it working.  We'll worry about how to fix the design another time.
  • Well, you might not have to.  I'll show you how to deal with that later.
  • You can, but that "somewhere" is going to be another constructor parameter e.g. ICurrentUserProvider. 
  • It doesn't, and it never did.  It's just that static methods allowed you to continue with this state of affairs.  You're going to have to break the circular dependency.
  • About 50%.  I didn't say it was going to be easy.
  • Yes, it pretty much does.  It doesn't really perform any worse, so don't worry about it.
  • Ridiculous numbers, especially if your object design is wrong or you're not using a dependency injection tool.  Just live with it for now, we'll come back to this another time.
  • It took me three hard slog days with pretty much the entire code base checked out.
  • Because it's assaulting the single responsibility principle with an axe.  You're going to want to restructure this at some point.  Not now.

These two patterns deal with times that you've got a requirement to be able to undo work.  There's Memento, which is a pattern of limited use (if only because a lot of the classic implementations are already available for you to use), and Command, which is one of the most criminally under-used patterns in the whole book.


Let's deal with Memento first.  Personally, I think most of the problem with Memento is its name.  It's a checkpoint.  You store checkpoints as you're working, and you restore back to a checkpoint if something doesn't work.  A good design for a limited set of use cases.  Why aren't you going to need this very often:

  • Because you're using a transactional system such as a database and can use the transactions directly.
  • Because you've followed good design principles and avoided having a lot of mutable state in your code.
  • Because your reversibility concerns are handled by using the command pattern.

If you're wondering how on earth you could implement this behaviour without mutable state, try taking a look at Eric Lippert's post on immutable stacks.  Usually, an approach like this is likely to be a better solution.

An interesting example where it is (probably) used is Prince of Persia: Sands of Time.  In the game, you can hit a button that reverses time.  By storing the state of every actor in recent frames, it can just as easily rewind them.  The example also highlights the importance of keeping the mementos small: if it, for instance, chose to serialize the walls every frame, the game would grind to a halt. 


Let's be clear: the command pattern is stone cold brilliant.  I know I've just been going on about the Memento pattern.  Now it's time to wake up.  Also every UI you create, every workflow you implement, has a Command pattern in it.  Well actually, it doesn't, because you don't know the Command pattern and it's not obvious.  But it should.

Here's the basic form of the command pattern:

interface ICommand {
    void PerformAction();

Now, in itself, this isn't that interesting.  So far, we've managed to represent an action (code) as an object (data).  As such, it might as well just be a function delegate.  However, where things get interesting is in the principal variation to the pattern.

Undo Command

To add reversibilty, we can extend the interface like this:

interface ICommand {
    void PerformAction();
    ICommand UndoCommand();

We've got something much more interesting.  All of a sudden we've got a pattern that you should be implementing in every GUI you ever write.  This is the answer to how Word allows you to undo multiple operations.  If we return to the Prince of Persia example, each monster would have a currently executing command, and the command object would need to provide additional information should as animation frames.  (Don't worry, I'm not about to start blogging about how to write a computer game.)

The reason this is so hugely important is that you need it from Day One.  Adding undo features to a design that doesn't implement the command pattern can be an exercise in frustration.  Obviously, the undo command of an undo command should be a redo command.  Generalizing further, you get to concepts such as the layers in PhotoShop, where you keep track of your action history on an object and can insert and replace actions. 

You'll note that GMail generalizes this concept in a couple of interesting ways:

  • A "following command" concept.  This is usually undo, but can be something else.  For instance "send an email" has a following command of "view the email you just sent".
  • Actually, sometimes there are multiple following commands.  This enables such functionality as "Invite this user to gmail".
  • Certain commands cannot be undone, and the UI supports highlighting those exceptions.  (You could just implement this by returning null for the Undo command, but that would prevent you from specifying behaviours when reversibility is lost.)

Serializing Commands

It's often an excellent idea to make your command objects serializable, both in the sense of supporting .NET serialization and also in the more general sense of being able to round-trip the object to the database.  You can do the following

  • Provide a complete audit log of your user's actions
  • More, be able to replay a user's session.
  • Analyze usage patterns to spot common behaviours.
  • Distribute your application across multiple servers, probably by using a publish/subscribe network such as nServiceBus.

Using Commands to make Transactions

Obviously, one obvious model for implementing rollback is the Memento pattern.  However, it's also possible to use the Command pattern for this purpose.  The following code illustrates how:

void PerformTransaction(IEnumerable<ICommand> commands)
    Stack<ICommand> executedCommands = new Stack<ICommand>();
    foreach (var command in commands)
        } catch
            foreach (var commandToRollback in executedCommands)


We've covered the two main patterns to implement reversibility.  In practice, the Memento pattern is often quite brittle.  Changes in the behaviour of related objects could lead to changes in what has to be stored.  This is less likely to happen with the Command pattern, since it separates out the responsibility for reversibility into the relevant transitions.  Furthermore, command objects can be extended in further directions, taking in permissions, interruptibility, batching, to name but a few.

So, why isn't it more heavily used?  Why is it that I always see UIs that desperately try to solve the problems the Command pattern solve in ad hoc and incomplete ways?  I think the problem is that it's quite hard to refactor to use this pattern:  there's just too many parts of your code affected by pulling the command concerns together.  This makes it all the more important that you design in Commands right from the start of your project.

I'll be frank, I don't like the Visitor Pattern.  It's a hack.  It's just a way of getting around a deficiency in the language.  Basically, extending the functionality of objects is what inheritance is for.  The whole reason the visitor pattern exists is to deal with the times that this model falls down.  Proxies and the decorator pattern could also be considered object extension mechanisms, but I've already dealt with them.  Another reason I don't like it is that it relies fairly heavily on abusing function overloading, and it's extremely brittle with regards to changes in your inheritance structure.

A note: it is often assumed that the visitor pattern has something to do with iteration and trees.  Whilst it can be used in such scenarios, it’s not really the point and often there’s a simpler solution.  So, what I'm going to talk about is double dispatch.

There's basically two limitations to virtual methods:

  • You can't add them to an existing class.
  • Sometimes you don't want to add them to an existing class.  This is usually because doing so would violate the single responsibility principle.

The visitor pattern is a way around this limitation, but it's not elegant.  What's worse, it requires you to be able to modify the target classes, so it doesn't even fully address the first limitation. Whilst it can be useful, it's always worth examining exactly why you need a visitor.  It can be a code smell.

Implementing the Visitor Pattern

Let's say that we wish to write a routine that processes messages in a trading system.  We'll say for the sake of argument there are order messages, execution messages and allocation messages.  Now, the "object orientated" way of doing this would be to add processing method directly to the class, but that isn't possible or desirable in C#.

So, we define a "visitor" interface

interface IMessageVisitor {
    void Visit(OrderMessage message);
    void Visit(AllocationMessage message);
    void Visit(ExecutionMessage message);

and we add a method to the IMessage interface.

void Visit(IMessageVisitor visitor); 

It's probably worth defining an interface IVisitable<TVisitor> for this purpose. 

internal interfaceIVisitable<TVisitor> {
void Visit(TVisitor visitor);

We now implement the following method in every one of our target classes:

void override Visit(IMessageVisitor visitor) { visitor.Visit(this); }

Note that you can't implement this code the once in a base class, because it won't work.  What this code does is to abuse function overloading.  If the message is an allocation message, it will call the "Visit Allocation Message" routine.  If you've got an "Automated Allocation Message" that inherits from allocation message, it'll call the same routine.  The same semantics, in other words, as a virtual function.

If, on the other hand, you wanted to specialize the "Automated Allocation Message", you'd need to change the IMessageVisitor.  It's not a perfect solution.

Alternatives to the Visitor Pattern

It's worth noting that many modern "typeless" programming languages allow you to add methods directly to classes at runtime.  This provides a strong alternative to the visitor pattern.  It doesn't violate the single responsibility principle as long as you segregate the scope of the routines.  If you can't modify the target classes, a (carefully written) big ugly cascading if statement can be used instead of implementing IVisitable.  Finally, you could use a chain of responsibility instead, which is effectively a well-structured cascading if statement, but a lot more flexible. 

The catch is: any of the above solutions don’t get you the magic static type checking of the visitor pattern.

In general terms, if you've only got a small number of implementations of a given IVisitor class, you should probably just consider adding virtual functions directly to the calling classes.  If, on the other hand, you have fifty, the visitor pattern may be pretty much the only way to keep your problem space manageable.

Tree Walking

The classic gang of four example of tree walking is actually a mix of the visitor pattern and the composite pattern.  With this, the parent nodes visit method automatically call visit on their children.  Seriously, just don't do this unless you absolutely have to.  You've mixed the dispatch behaviour with iteration behaviour, there's no way for the caller to figure out the structure of the tree and you can't vary between depth and breadth first iteration.  Here's some code that's often more useful than doing the composite and visitor trick:

interface INode<TNode> {
    IEnumerable<TNode> Children { get; }
IEnumerable<TNode> DepthFirst<TNode>(TNode root) 
where TNode: INode<TNode>
    return new[] { root }.Union(root.Children.SelectMany(c => DepthFirst(c)));
IEnumerable<TNode> BreadthFirst<TNode>(TNode root)
where TNode : INode<TNode>
    return BreadthFirst<TNode>(new[] { root });
IEnumerable<TNode> BreadthFirst<TNode>(IEnumerable<TNode> children)
where TNode : INode<TNode>
    return children.Union(children.SelectMany(c => BreadthFirst(c)));
Technorati Tags: ,

UPDATE:  This article used to contain text about the composite pattern, which I’ve removed.  You can find my revised thoughts about composite pattern here, or the original text with editorial here.

The patterns book is 15 years old, that's about 150 in developer years.  All told, it's amazing that more of the patterns aren't out of date.  Here are some that I think could be safely retired.  Again, they're not necessarily bad ideas, it's just that they're special cases of more general principles, and I favour understanding the principles.


I'll be honest, I'm not sure I get the point of the Bridge pattern.  A bridge is exemplified by the following example from finance:

  • You have an interface for a traded instrument.  Call it IInstrument.
  • Shares are one of the simplest kinds of instrument.  We extend the interface to IEquity.
  • We provide a base implementation of Instruments that developers can inherit from.  We call this InstrumentBase.
  • We implement shares using the concrete class Equity which inherits from InstrumentBase.

In this case, we could modify the IInstrument interface.  This would affect the InstrumentBase classes, but not the Equity class.  Equally, we could modify the implementation of Instrument without modifying the external interfaces.

Now, I have a problem with all of this.  Basically, as far as I can tell, the Bridge pattern is what I call object-orientated development.  There's two components of the pattern:

  • The use of interfaces to shield implementations from the consumers.  This is the Open/Closed principle.
  • The use of inheritance to specialize.  This is also known as coding in C#.

In short, Bridge doesn't say anything other than "built according to good design principles".

Template and Strategy

Template patterns are the same as Bridge patterns, only the emphasis is different.  Rather than specializing an entity, you're specializing an algorithm.  Now, the interesting thing here is: there's two ways to create a general algorithm with specialization.

  • Inheritance (which the Template pattern requires)
  • Composition (in which case you call it a Strategy pattern)

Typically, I'd favour the latter, as would Gamma et al. 

Usually, in C#, you can generalize an algorithm just by passing in a function.  For instance, consider the following code:

gofPatterns.FirstOrDefault(pattern => pattern.Name == "Template")

FirstOrDefault is a parameterized algorithm.  The expression "  pattern => pattern.Name == "Template"  " specializes it.  Implementing the same algorithm through inheritance would just be plain painful.  In short, Bridge and Template are both names for specific cases of more general problems, and the specialization isn't useful.  So, I wouldn't really recommend using the terminology.  That's not to say that you won't use the pattern, just that you'll probably not help anyone by telling them you used a template pattern when you did so.

In practice, I have been known to use the term "Strategy Pattern", but usually I just mean "I outsourced some decision making".  In general I think you're better off just understanding that there are multiple ways of parameterizing an algorithm.


Basically, a flyweight object is a representation of an entity which strips out most of the information and just keeps the truly vital information.  This is done for memory conservation.  I'm including this under "obvious" because it will have occurred to anyone who has run into the problem it solves (and because I can't think of a better place to put it).  You could also describe this as a "reference" pattern, since often the flyweight objects contain just enough information to find the true entity in a database if needed.

An interesting case is lazy loading in NHibernate.  NHibernate generates proxy objects which are themselves initially flyweight, but become heavyweight on their first access.

Again, I don't think this is particularly useful terminology.  In the time since Design Patterns was written, the whole idea of a "true" object representation has declined in importance.  Finely-grained interfaces customized by consumer, pervasive proxying, presentation models are all part of a general principle that objects matter at point of use.  The world has moved on.


Bridge, Template and Strategy are also obvious patterns, but unlike Facade, Builder and Adapter, aren't really useful for communication. 

  • In the case of Bridge, typically just saying that you've a) hidden something behind an interface or b) used inheritance to specialize is more clear. 
  • In the case of Template or Strategy, it's better to refer to what you're doing as a parameterized algorithm, irrespective of the form that parameterization takes.
  • "Strategy Pattern" is nonetheless a popular term, and I'm not advocating getting into a terminology war with something who wants to call what they're doing a strategy pattern.  But you might wish to introduce them to Erlang sometime...

The Flyweight pattern is part of a general principle that objects can have more than one representation.  I wouldn't personally use the term.

Okay, we're onto the first of what I described as the "stone cold brilliant patterns".  This particular one looks obvious, but its benefits are actually quite subtle.  There are basically only two ways of creating an object in .NET:

  • Call new
  • Call a function that calls new

The factory pattern is basically just using the latter.  Now, this might strike you as an unnecessary level of indirection, and you might be right.  You'd be right if:

  • The class is a pure data transfer object.
  • The class doesn't have a complex data structure.
  • The class has no functional logic.

I'd rather not talk about testing too much, but what this comes down to is "The object could be easily created in a test."  Now, if you're not that into testing (because, I don't know, maybe you like bugs...) you might be wondering why I'm going on about testing.  Well, testing is re-use.  These objects are what I term "Leaf Objects".  They're the basic values in our system.  Everything else, you need a factory.

The whole point of the pattern is to avoid the basic drawback of new: that it isn't polymorphic.  You can't call new Employee() and receive a MicrosoftEmployee object.  And even if you don't want that to happen, one day you probably will wish you could.

Incidentally, don't create a static factory method.  If you do, you've missed the whole point.  The same goes for putting significant logic into your factory.  Logic mixed with object creation is inflexible.  In C++, you can overload the behaviour of the new operator.  This, sadly, isn't as useful as it first appears, since again it's a static operation and cannot be varied according to runtime context.  There's no way to achieve the design benefits of the factory pattern.

The Difference between Factory and Abstract Factory

To be honest, I don't really think the distinction between factory and abstract factory is useful:

  • Factory Method.  You declare a variable of type Func<>.  There's not much more to it, really.
  • Abstract Factory:  You declare an interface in which every method returns or creates an object

It's the difference between one function and multiple functions.


I don't honestly think I've done this pattern justice.  It's probably the single most important design pattern we've got.  It falls into an interesting category: one where it's obvious what you do, but much less obvious when you should use it.  It took me years to appreciate how important it was to well-designed code.  I started to understand it through reading Miško Hevery's work and applying that to my own experience of where my own testing approach was falling down. 

I guarantee, you already know these patterns.  However, the patterns terminology is useful, if only to communicate the concepts quickly.


An object used to build another object.  The most obvious implementation of this pattern in the framework is StringBuilder.  It can be quite useful to have a builder in cases where what you're constructing is complex and you don't need to read from the constructed object as you're going.  (If you do, just using the object's own method is often simpler.)  The Builder pattern is used in fluent interfaces to support method chaining.  In this case, the builder constructs the object internally, but returns itself.

Arguably, this pattern should be marked "Caution: Hazardous Material".  Although everyone is familiar with it, every so often someone gets it into their head that every object should have a builder.  The code rapidly becomes a mass of useless indirection.


You want to expose one interface from an object, but it exposes a different interface.  You write an adapter object to translate calls from one to the other.  Differs from the proxy pattern only in as much as the proxy pattern mandates that the exposed and the internal interface should be the same, but it's definitely a proxy in the looser sense that we commonly use.

Adapters usually occur at sub-system or system boundaries.  Third-party libraries should typically be wrapped


Now, this pattern is so general it's going to cover a lot of code you've written over time, but the terminology is actually useful, simply because we do actually need a word for it.  Say you've got an extremely complex trading system.  However, all that your code needs is a list of accounts and the cash in each.  A Facade is an interface that just exposes the bit you need. 


Facade, Builder and Adapter are amongst the most common patterns in software development.  Most developers will have independently come up with these solutions, since they're pretty obvious.  The terminology can, however, be useful to communicate between developers.

Bridge and Template are also obvious, but unlike Facade, Builder and Adapter, aren't really useful for communication. 

  • In the case of bridge, typically just saying that you've hidden something behind an interface is more clear. 
  • In the case of Template, it's better to refer to what you're doing as a parameterized algorithm, irrespective of the form that parameterization takes.

The Flyweight pattern is part of a general principle that objects can have more than one representation.  I wouldn't personally use the term.

There's one more "obvious" pattern: factory.  However, when you should use it isn't as obvious, so it's getting a post of its own.

These is basically patterns where you're better off avoiding them.  Seriously, just skip those parts of the book:


All patterns are a trade-off.  The trade-off with singleton is reducing the number of parameters certain functions take and trading them for more static methods.  From the perspective of code agility, a every method on singleton might as well be a static method.  Not only that, but everything the singleton touches might as well be a static method.  The testing implications of the Singleton pattern are horrendous.  Steve Yegge has written far more than I intend to on the subject, so I'm really not going to spend any more time on telling you what a stupid, stupid idea it is.  Singletons, static methods and global variables all expose the same problem.  In fact, they're pretty much effectively the same thing most of the time. 

So, what should you do instead of a singleton?  Well, let's say, for instance, that you have a network connection (call it X) wrapped as an XConnection object.  There's only one physical connection, and that needs to be accessed from various parts of the system.  Well, actually, the answer is very simple: you pass the object into the contructor of the objects that need it.  If some of those objects get created before the network connection gets called, you could pass in an object that creates the connection on the first call, or you could pass in a promise/lazy object/future/whatever it's called in the language of your choice.  Basically, take the static methods you were thinking of, and turn them into instance methods.

Now, the technique that we've just described is called Dependency Injection (DI), and is made easier by using a DI container.  Ironically, you tend to tell the container that the connection object has a "Singleton" lifecycle.  But all that means is that there's one connection referenced by the container, not one connection usable in the whole of your memory space.  The distinction may appear small, but it's the difference between a flexible design and a brittle nightmare.  Much as certain extremely talented people may use the technique, don't make your container a singleton.  That way lies madness.

Singletons are, of course, a special case of the concept of shared state.  In general terms, state and especially shared state, is problematic.  Designs that share very little state or, even better, have no state at all tend to be less prone to hard to analyze bugs and easier to repurpose when requirements change.


This one basically says you shouldn't create your objects, you should copy them.  We already know this concept in .NET, it's called ICloneable.  There's all sorts of problems with cloning objects, number one of which is that it's not that well defined.  If X has Y as a property, when you clone X, do you clone Y?  Seriously, semantically, the prototype pattern is a mess.

The BCL team were talking about demising ICloneable five years ago

The thing with prototype is: you can see how it might look sensible if you're a C++ programmer.  Copy means something quite specific: a memory-based clone.  Any good C++ developer would know exactly what that would do.  It'd copy things declared as values and re-use things declared as pointers.  Let's just think about that for a second: what happens when your data structure changes a value to a reference?  Well, the semantics of your copy change.  And, of course, you can't copy anything that uses an external resource.

Now, JavaScript is based around the idea of prototypes instead of classes.  These are not the same thing as the prototype pattern. 

On the other hand, it's important to note that sometimes context is everything. 


The Singleton and Prototype pattern have no place in the arsenal of the educated C# developer.  Prototype just makes no real sense, Singleton will positively ruin your code.  Most people know that singleton is a bad idea, but are still enamoured with the concept of shared state.  On the other hand, I don't think anyone is seriously attempting to use the prototype pattern in C# in the first place.

EDIT: Thanks to Svend Koustrup for pointing out a dangling sentence.

Previously, I talked about patterns where you were unlikely to implement them yourself, since they are now considered part of the infrastructure that goes with a modern language.  The publish/subscribe model isn't quite ready to be described as infrastructure, but it's getting close.  So, here are two patterns I don't honestly think it's worth knowing.  Here, it's not so much that the patterns are bad, just that you're better off understanding and using the publish/subscribe model.  Here's how the pub/sub model works:

  • Participants are called services.
  • They do not communicate directly with one another, but through messages/events raised by one and consumed by others.
  • Each service can register its interest in the messages for any other participant.
  • However, no service references any other participant.  Instead, messages are placed on a message bus, and consumed by means of method calls on the destination object or objects.
  • In asynchronous scenarios, when an event is raised, it is placed on a queue.  The service consumes the messages in the order they were sent, but not necessarily on the same box and definitely with no guarantee of exactly when it happens.

Now, here's some implementations of the pub/sub pattern you can use for free:

Of these, each has their advantages.  Personally, I dislike .NET events because getting a message from one service to another pretty much requires you to be able to reference both objects, which cuts down on the decoupling that pub/sub gives you.  Of the others:

  • nServiceBus is most appropriate for long running processes, and those that split across machines. 
  • retlang is preferable when you've got low-latency requirements, and you're happy with everything running on one machine.
  • Udi Dahan's code is incredibly short, and appropriate when you don't feel the need for a more complex solution.  It is synchronous and single threaded, which can be a blessing or a curse depending on your context.

Now, I was promising that I'd talk about GoF patterns, so let's get back to it:


One object listens to events/messages pushed out by another.  Any consuming service in a pub/sub framework could be considered an observer, as could anything that listens to .NET events.  However, it's very generality means that it's not really that useful as a concept.


A term used to describe an object that is used when services don't directly communicate with one another.  In the publish/subscribe model, this is the message bus. 


The Observer and Mediator patterns should not be implemented directly by your code.  If it looks like one or the other might be appropriate, you should consider the use of a publish/subscribe framework.

NOTE:  I’m going to have to revise this in light of Rx, a useful framework that targets the observer pattern directly.

Some patterns are so good they become part of the way that people think about systems.  They get implemented using general solutions and then people just use them as a piece of technology.  In these cases, a modern developer is himself unlike to implement the pattern, he's much more likely to just use the libraries that implement the pattern.  I'm terming these infrastructure patterns.

Proxy and Decorator

A modern developer lives with proxy objects the whole time.  If you have an object X, a proxy for object X behaves is an object that exposes the same interface as X and delegates calls through to X.

In many ways, Decorator and Proxy are the same pattern.  If you don't believe me, check the UML diagrams on Wikipedia (this from the guy who hates UML).  The major difference is one of intent.  We call it a decorator if it modifies functionality, or a proxy if it doesn’t. 

You can do quite a few things with a proxy/decorator:

  • NHibernate uses entity proxies to allow objects to be lazily loaded.
  • You could use Castle DynamicProxy to log all calls to a destination object.
  • Equally, you could a use a proxy to restrict access and add a permissioning system.
  • You use client proxies in WCF to make it look like you're calling a service directly.  In practice, you call a proxy, the proxy performs network communication with the server and then the server calls the real object.

In practice, quite a lot of the time the boundary between the two terms is blurred.  It’s usually called a proxy if you’re crossing a system boundary, but if you’re checking permissions?  Strictly it’s a decorator.

In practice, most of your needs for true proxy objects are handled by framework libraries.  Those that aren't, I recommend using a general proxy solution like DynamicProxy.  Krzysztof has written an excellent tutorial.  However, just because you're not implementing proxies yourself, it doesn't mean you don't need to know what one is.  On the other hand, most modern developers do understand this stuff already, because much of what we write these days involves some form of remote component.

The classic example of the decorator pattern in the .NET Framework is the IO library (it pretty much copies Java's approach).  In the call "new BufferedStream(stream)", the BufferedStream is a decorator that buffers the calls to the underlying stream.  It's exactly the same as the original stream, except that it's buffered. 

For large APIs such as System.IO.Stream, it's rarely advisable to do this stuff by hand, because delegating methods is both dull and error prone. 


Steve Yegge pointed out that this pattern basically just works around a deficiency in the design of C++.  In C#, it's the foreach statement.  Call it iterator if you want, I tend to call it IEnumerable.


An interpreter is a language that is executed without going through a compilation state.  The interpreter pattern is basically the same.  Now, there's basically two cases in which you'd be hit with this as a requirement:

  • You've been provided with an external language specification that you need to parse and process.
  • You need to create a small language to deal with a particularly flexible requirement.

In the first case, it doesn't matter what you do, you're writing an interpreter or compiler.  Since anything you did could be described as matching the interpreter pattern, it's hard to see how useful the terminology is.  Interpreter was a term of art in the 1960s, not a design pattern in the 80s.

In the second case, you're writing a DSL.  And here, frankly, I wouldn't use an interpreter.  Most modern thinking on this subject says that you're better off using a language that already exists and tweaking it to your purpose. 

The most obvious language choices are IronPython, IronRuby or Boo. 

  • IronPython has a relatively inflexible syntax, but it's extremely sophisticated and can be happily used for scripting purposes. 
  • IronRuby (and Ruby in general) is the most popular DSL base language in the world, with a very flexible syntax.  It's not as mature as IronPython, though, which may put you off.
  • Boo has a customizable syntax, is mature and is open source.  However, it's syntax is possibly over-flexible, to the extent that it can be very hard to figure out what a DSL does without reading its source code.  You can mitigate this by writing decent documentation, but typically DSLs don't come with much.


The iterator and interpreter patterns are both terminology that ought to be retired.  This is not to say they're bad patterns, it's just that it's not useful to name the concepts anymore.  The term "interpreter" is best used simply to denote the same concept it represented in the 1960s: runtime evaluation of a programming language.  The proxy pattern is something you're highly unlikely to need to implement yourself, but it's extremely important you understand the concept it represents and when your code is using proxies.  Decorator is just a proxy that modifies behaviour.  You’re actually quite likely to implement your own decorators, though.

UPDATE:  The text about decorators has been revised.  The original text can be found here.

I'm about to give a talk on Gang of Four patterns.  I've got to admit, I'd rather have been talking about the seminal post-punk band.  For one thing, if people disagree with my opinions ("Entertainment" is genius, for the record...) they're unlikely to take it personally.  The thing is, everyone treats the book as being something akin to the Bible in the 16th century:

  • It's incredibly important
  • It's full of useful ideas
  • It's hugely respected
  • But no-one's read it
  • And a fair number of those that do don't understand it, and take it out context.

None of these are actually problems with the book per se, but misinterpretation is annoyingly common.  You hear people boasting of how many different patterns they managed to implement in their code base, as if increased complexity were something of which to be proud.  You see people shoehorning patterns into problems which don't match the use cases.  Personally, I wish more people bothered to read Chapter 1, especially the section on favouring composition over inheritance.

Taking Another Look

I have to admit, these attitudes coloured my own stance on the book for years.  I was leery of developers who espoused patterns because it seemed that use of patterns was more important to them than delivering solutions.  I'm hardly the first person to say this

Like the Bible, it's better when you have a modern translation, since frankly 15 years is a long time in software development.  To give you an idea, Java was released a year later.  Some were a stupid idea, some just aren't as important anymore, some are special cases of more general ideas, some people use all the time and the only real benefit is terminology.  And some...

...well actually, some are so important that you really, really, need to understand them.  So, I decided to write a skeptic's guide to design patterns.  You're almost certainly going to disagree with me, but for the record, the patterns I think are still relevant and unobvious today are:

  • Factory / Abstract Factory
  • Command
  • Chain of Responsibility
  • State

Sorry, that's it.  Don't worry, I'm wearing asbestos.

Initial Observations

Going through the book again, a couple of themes keep cropping up:

  • Unlike most modern patterns, the use cases often overlap.  Because the focus is on low-level programming constructs, there's often more than one way to do something.
  • The design patterns are, contrary to popular belief, extremely language dependent.  Iterator is irrelevant in C#, Visitor is irrelevant in JavaScript.
  • The patterns are often special cases of more general concepts and techniques.

You're not going to find any diagrams in this stuff.  Frankly, I think that UML is a positive hindrance to understanding what's going on in most of these patterns.

  • For one, I don't regard the presence or absence of an abstract interface as fundamentally changing a design, it's just a refinement (one that you should pretty much always be using).
  • For another, the difference between composition and inheritance is significant, but I'd argue doesn't actually change a pattern.  Hence, I'd argue that Template and Strategy are actually the same.
Technorati Tags: ,

Retlang 0.4 has been out for quite a while, but I've never written about it.  Worse, my example code doesn't work in it, which has garnered complaints (in my defence, the Retlang wiki is out of date as well).  The version is again a restructuring that you shouldn't apply to your own code without a bit of thought, but it's added a killer feature: the ability to interact with WinForms.  Or WPF if you're that way inclined.

Anyway, I've drawn up a new version of the spider, but it differs quite a bit from the previous versions:

  • It now has something resembling an architecture.
  • Keeping track of progress is now handled separately from keeping track of which URLs have been encountered.
  • I've used the ability to throw a command directly onto a fiber.  (Note for old users:  where you used to say "Context" or "Queue", you now say "Fiber")
  • I've abstracted out the basic graph-walking functionality from the URL reading functionality.  It's not clear to me that it's more readable this way, although the advantages of abstraction tend to grow with scale.
  • If you pass an argument of "interactive" in, it will show a progress bar.

I still haven't gone with Mike Rettig's suggestion to group channels together into service classes, which are then passed into the constructors of their consumers.  Instead, I've mostly gone for a single top-level routine that sets up the whole routing.  However, this ran into trouble when dealing with the new feature, the ability to switch on a progress bar.

Basically, UI fibers look the same in Retlang as threads, but ultimately they're not.  So the main set up routine needed access to the particular fiber.  I achieved that by actually making the fiber a property of the object.  At one point in the design (yes, there is some design...) this produced a chicken-and-egg problem. 

  • I wanted the form to have a fiber property
  • I wanted to pass that in in the constructor to the form
  • But the fiber itself wanted the form in its constructor

This looked like a serious problem until I realized that there was a better design, which separated out the job of tracking progress from reporting progress.  This is reflective of a more general fact about retlang: it really punishes you sometimes if your design isn't SOLID.  This isn't a criticism, in fact it's one of the best features of Retlang: it pushes you towards better designs.

Anyway, enough rambling and on with the code.  Feedback on whether or not this is better at illustrating principles than the old version will be appreciated.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;
using System.Threading;
using System.Windows.Forms;
using Retlang.Channels;
using Retlang.Core;
using Retlang.Fibers;

delegate IEnumerable<TNode> NodeWalker<TNode>(TNode root);

interface IProgressCounter<TNode> {
    void NodeQueued(TNode node);
    void NodeCompleted(TNode node);

    IFiber Fiber { get; }

    void WaitUntilComplete();

interface IGraphListener<TNode>
    bool HasEncounteredNode(TNode node);

    void ProcessNode(TNode node);

class GeneralGraphListener<TNode> : IGraphListener<TNode>
    private readonly HashSet<TNode> alreadyProcessed;

    public GeneralGraphListener(IEqualityComparer<TNode> comparer)
        alreadyProcessed = new HashSet<TNode>(comparer);

    public virtual bool HasEncounteredNode(TNode node)
        return alreadyProcessed.Contains(node);

    public virtual void ProcessNode(TNode node)

    public IEnumerable<TNode> NodesEncountered
        get { return alreadyProcessed; }

static class RetlangMacros
    internal static Action<TValue> Distribute<TValue>(Func<Action<TValue>> processorFactory, int numberOfProcessors)
        var distributeChannel = new QueueChannel<TValue>();
        for (int index = 0; index < numberOfProcessors; index++) {
            var processor = processorFactory();
            var queue = new ThreadFiber();
            distributeChannel.Subscribe(queue, processor);
        return distributeChannel.Publish;

    internal static IPublisher<TNode> CreateGraphWalker<TNode>(
        Func<NodeWalker<TNode>> nodeWalkerFactory, 
        int numberOfProcessors, 
        IGraphListener<TNode> listener,
        IProgressCounter<TNode> progressCounter)
        var foundNodeChannel = new Channel<TNode>();
        var enqueuedNodeChannel = new Channel<TNode>();
        var finishedNodeChannel = new Channel<TNode>();

        Func<Action<TNode>> nodeProcessorFactory = () => {
            var walker = nodeWalkerFactory();
            return node => {
                foreach (TNode child in walker(node)) {
        var walkChildren = Distribute(nodeProcessorFactory, numberOfProcessors);
        var trackerQueue = new PoolFiber();
        enqueuedNodeChannel.Subscribe(progressCounter.Fiber, progressCounter.NodeQueued);
        finishedNodeChannel.Subscribe(progressCounter.Fiber, progressCounter.NodeCompleted);
        foundNodeChannel.Subscribe(trackerQueue, node =>
            if (!listener.HasEncounteredNode(node)) {

        return foundNodeChannel;

class ProgressCounter<TNode> : IProgressCounter<TNode>
    private readonly IFiber fiber;
    private readonly IProgressReporter reporter;
    private readonly AutoResetEvent waitHandle;
    int nodesEncountered;
    int nodesProcessed;

    internal ProgressCounter(AutoResetEvent waitHandle, IFiber fiber, IProgressReporter reporter)
        this.waitHandle = waitHandle;
        this.fiber = fiber;
        this.reporter = reporter;

    public IFiber Fiber
        get { return fiber; }

    public virtual void NodeQueued(TNode node)

    public virtual void NodeCompleted(TNode node) {
        reporter.Report(nodesProcessed, nodesEncountered);
        if (nodesProcessed == nodesEncountered) {

    ///NOTE that this routine is called on a separate fiber from the other functions in
    ///this class.  All other classes stick to their fibers.
    public virtual void WaitUntilComplete() {

interface IProgressReporter : IDisposable
    void Report(int nodesProcessed, int nodesEncountered);

class ConsoleProgressReporter : IProgressReporter
    public void Report(int nodesProcessed, int nodesEncountered)
        Console.WriteLine(string.Format("{0}/{1}", nodesProcessed, nodesEncountered));

    public void Dispose()
        // Not needed

class FormProgressReporter : Form, IProgressReporter {
    private readonly ProgressBar progressBar;

    public FormProgressReporter()
        progressBar = new ProgressBar();
        progressBar.Dock = DockStyle.Fill;
        progressBar.Location = new System.Drawing.Point(0, 0);
        progressBar.Name = "progressBar";
        progressBar.Size = new System.Drawing.Size(292, 266);
        progressBar.TabIndex = 0;
        Height = 75;
        Width = 600;

    public void Report(int nodesProcessed, int nodesEncountered)
        Text = string.Format("{0}/{1}", nodesProcessed, nodesEncountered);
        progressBar.Maximum = nodesEncountered;
        progressBar.Value = nodesProcessed;

class Program
    static void Main(string[] args)
        string baseUrl = "http://www.yourwebsite.xyzzy/";
        int spiderThreadsCount = 15;
        bool hasUI = args.Length > 0;
        if (hasUI)
        var waitHandle = new AutoResetEvent(false);
        var reporter = hasUI
            ? (IProgressReporter)new FormProgressReporter()
            : new ConsoleProgressReporter();
        var form = reporter as FormProgressReporter;
        if (form != null) {
            new Thread(() => Application.Run(form)).Start();
        using (var fiber = hasUI 
            ? new FormFiber(form, new BatchAndSingleExecutor())
            : (IFiber) new PoolFiber())
            var progressCounter = new ProgressCounter<string>(
                waitHandle, fiber, reporter);
            var listener = new GeneralGraphListener<string>(
            var walker = RetlangMacros.CreateGraphWalker(
                () => new Spider(baseUrl).FindReferencedUrls,
            foreach (string url in listener.NodesEncountered.OrderBy(url => url)) {

    class Spider
        string _baseUrl;

        public Spider(string baseUrl)
            _baseUrl = baseUrl.ToLowerInvariant();

        public IEnumerable<string> FindReferencedUrls(string pageUrl) {
            if (Path.GetExtension(pageUrl) == "css")
                return new string[0];
            string content = GetContent(pageUrl);
            return from url in Urls(content, "href='(?<Url>[^'<>]+)'")
                            .Union(Urls(content, "href=\"(?<Url>[^\"<>]+)\""))
                            .Union(Urls(content, "href=(?<Url>[^'\" <>]+)"))
                       where url != null && url.Length > 0
                            && IsInternalLink(url) && url[0] != '#'
                            && !url.Contains("&lt")
                            && !url.Contains("[")
                            && !url.Contains("\\")
                            && !url.EndsWith(".css")
                            && !url.Contains("css.axd")
                       select ToAbsoluteUrl(pageUrl, url);

        static int BaseUrlIndex(string url)
            // This finds the first / after //
            return url.IndexOf('/', url.IndexOf("//") + 2);

        string ToAbsoluteUrl(string url, string relativeUrl)
            int hashIndex = relativeUrl.IndexOf('#');
            if (hashIndex >= 0) {
                relativeUrl = relativeUrl.Substring(0, hashIndex);
            if (relativeUrl.Contains("//"))
                return relativeUrl;
            if (relativeUrl.Length > 0)
                bool isRoot = relativeUrl.StartsWith("/");
                int index = isRoot 
                    ? BaseUrlIndex(url) 
                    : url.LastIndexOf('/') + 1;
                if (index < 0) {
                    throw new ArgumentException(string.Format("The url {0} is not correctly formatted.", url));
                var result = url.Substring(0, index) + relativeUrl;
                return result;
            return null;

        bool IsInternalLink(string url)
            url = url.ToLowerInvariant();
            if (url.StartsWith(_baseUrl))
                return true;
            if (url.StartsWith("http") || url.StartsWith("ftp") || url.StartsWith("javascript"))
                return false;
            if (url.Contains("javascript-error"))
                return false;
            return true;

        static IEnumerable<string> Urls(string content, string pattern)
            var regex = new Regex(pattern);
            // Why exactly doesn't MatchCollection implement IEnumerable<Match> ?
            return from match in regex.Matches(content).Cast<Match>()
                   select match.Groups["Url"].Value;

        static string GetContent(string url)
            var request = WebRequest.Create(url);
            request.Proxy = WebRequest.DefaultWebProxy;
            try {
                using (var response = request.GetResponse()) {
                    using (var reader = new StreamReader(response.GetResponseStream())) {
                        return reader.ReadToEnd();
            } catch (WebException ex) {
                Console.Error.WriteLine("Problem reading url {0}, message {1}.", url, ex.Message);
                return "";


Technorati Tags: ,