The Long Queue Problem

23 May 2010 22:37

Dan North once described an advanced beginner as someone who makes all the mistakes an advanced beginner makes.  It doesn't matter if you warn them about the mistake or not, they'll still make it.  This post is about one of those problems.  Hopefully someone will find it useful, but what I know of the Dreyfus model suggests that it'll only be useful after something fails in production.

Here's the problem: let's say you have a process.  It's fairly easy to divide into two parts.  The first, A, is fast and feeds the second, B, which is slow.  So, you make it concurrent following some kind of pipes and filters arrangement.  Problem is, you get exactly the same effect when you push a motorway's traffic onto a B-road: traffic jams for miles.

A traffic jam may not sound like a bad thing, but you need to monitor them.  Any long system queue can be a sign that your architecture is failing.  The problem is even worse if you're using in-process concurrency like Retlang or the TPL.  All of your queues are in memory.  Too long a queue and you'll crash.  But even if you're using MSMQ, you're still subject to the Micawber Principle.

Dealing With It

Here's some things you can do that don't fix the problem, but help you scale further:

  • If the queue is going to be huge, consider reducing the memory footprint of the message passed between A and B. 
  • Make your application 64-bit.
  • Make the queue persist to a database.  This doesn't reduce the jam, but it does reduce the memory consumption.  (Obviously, if you're using MSMQ or similar, this is done for you.)

Although they don't address the root cause, these can be useful nonetheless.  They could allow you to scale as much as you need, but they're not an unlimited solution.

An approach that sometimes fixes the problem is to parallelize B's processing.  This is only appropriate in certain circumstances, and there's a law of diminishing returns as you add more threads.  For instance, if B is 4 times slower than A, and perfectly parallelizable, then four worker threads will solve the problem.  However, if it's 25 times slower, and parallelism maxes out at a factor of 16, it's going to improve matters, but not fix it.

Finally, there's addressing the root cause: 

  • Optimize B so that it's fast enough.  Sometime possible and worth considering.
  • After you've tried that, redesign B into multiple parts, each of which are, on average, at least as fast as A.  If you've got a factor of 100 difference, that's pretty much the only thing that'll solve the problem.

Basically, balancing performance isn't a nice to have, it's essential if you want your system to work at all.

Comments
Gravatar
# re: The Long Queue Problem
Posted by Chris Patterson on 23/05/2010 23:25
Of course, another option might be to control the rate of A. Perhaps a pull or join arrangement between A/B might be more suitable than just slamming A into B.

Of course, in many cases, the rate of A can't be controlled and you make good sense here. I just wanted to point out the obvious to make sure it's been ruled out before digging deeper.
Gravatar
# re: The Long Queue Problem
Posted by Julian on 24/05/2010 07:41
A very good point. I'm thinking that, as a rule of thumb, you can adequately control the rate if you're batch processing. If, on the other hand, you're responding to user input or a feed, you don't have that option.

A third option I should have mentioned: just throw some of the messages away. Usually this isn't possible, but when it's relevant (e.g. a price feed) it can be extremely useful. Retlang's got a feature for this (batch keyed subscriptions) but I think it requires the in-memory nature of the queues. I doubt implementing such a feature on MSMQ would buy you very much, although I could be wrong. You could view this as a subset of the problem space of "optimize B", of course :)
Something to add?

Talking sense? Talking rubbish? Something I'm missing? Let me know!

Fields denoted with a "*" are required.

 (will not be displayed)

 
Please add 3 and 5 and type the answer here:

Preview Your Comment