Clojure has a Problem with Async

Thursday, February 21, 2013 9:04 PM

Clojure, like node.js, is a very opinionated platform.  The funny thing is that almost every opinion is different. 

Clojure embraces Java as a platform. 

  • Originally, every declared identifier was overrideable on a per-thread basis. 
  • There’s many features (e.g. Futures and Reducers) that allow you embrace multi-threading at a high level.
  • Data is immutable.
  • Data is globally shared between threads.
  • It adds STM to Java’s already extensive thread-synchronization primitives.
  • Everything’s a function. 

Node, conversely embraces Javascript

  • It’s aggressively single thread and asynchronous.
  • If you want another thread, you’ll have to start another process.
  • Everything’s mutable, even class definitions.
  • Share data between processes?  I hope you like memory mapping.
  • Synchronization barriers?  You don’t need them. 
  • Everything’s an event with a callback.

Clojure and Node have completely different sweet spots: clojure is truly excellent at computation, node at IO.  Like it or not, multiple threads aren’t really a good solution to blocking IO solutions.  Which is a pity, because all the main Clojure libraries feature blocking IO (e.g. clojure.java.jdbc, ring).  That’s not to say there isn’t some amazing stuff being done in Clojure, just that it could be even better.

JDBC is an interesting case because it’s a Java problem that works its way through to Clojure.  Node.js made a virtue of being the only API on a new platform.  However, it introduces a couple of oddities of its own.  For instance, the jdbc library can only have one open database connection at once.  Usually the case, but sometimes undesirable (try performing a reconciliation of a million records between two databases).  To some extent, this is a hangover of Clojure being envisaged as an application language that used libraries written in Java. 

There’s nothing stopping you from writing Clojure code in a node-like style, as long as you’re prepared to write your own web-server (Webbit, Aleph) and DB libraries (er… no-one).  Equally, implementing a feature like co-routines wouldn’t actually be that hard, but you’d lose bindings, which is a problem for any library that assumes that they work.  And you’d still need all of your libraries to be async.

For all these reasons, I don’t think we’re going to be seeing a proper Clojure async solution any time soon.  Ironically, I think it’s the complete absence of async DB libraries that is really holding it back.  Without that, solving most of the other things isn’t really that useful.

Comments
Gravatar
# re: Clojure has a Problem with Async
Posted by my eyes on 2/21/2013 10:53 PM
and you, sir, have a problem with colors :)
Gravatar
# re: Clojure has a Problem with Async
Posted by Ryan Kelker on 2/22/2013 2:04 AM
You might want to read this and see if you feel the same. 600k concurrent HTTP connections, with Clojure & http-kit http-kit.org
Gravatar
# re: Clojure has a Problem with Async
Posted by Julian on 2/22/2013 3:23 AM
I probably should have mentioned http kit in the main article. It's impressive, but it doesn't actually solve the blocking IO issue, just hives it off into its own threadpool (the very first example on the homepage uses future, for instance). If you want to do async work at any deeper level, you're still left on your own with a set of language features and libraries that don't really have you in mind.

Don't get me wrong, http-kit looks like a superb piece of work, but I seriously doubt there's much difference between using it or Jetty to access a Postgres DB.
Gravatar
# re: Clojure has a Problem with Async
Posted by Anonymous on 2/22/2013 8:50 AM
Async I/O has been around on the JVM since mid-2000s. Libraries such as Netty (and what is built on top of them, like Finagle) are making use of the same OS APIs node.js does. Node is by no means special: it uses the same async I/O APIs and has a single scheduler (run loop) so only one thread is ever available to your programs. Erlang VM has multiple schedulers, JVM (via OS thread scheduler) also has multiple ones, I believe so does Go.
The decision to use callbacks or futures is very much an API design choice.

From the runtime perspective, Node.js has no advantage in the I/O department but it has severe limitations in other ones as far as a single process goes.

So, Clojure or any other JVM language does not have "an async problem". People have been doing async I/O in Perl and Python in late 90s and early 2000s, on the JVM since mid-2000s and related OS APIs haven't really changed since then.

And while we are at it, there are async data store and HTTP clients in Clojure (http-kit, aleph, Casyn and so on) and underlying JVM libraries often use async I/O without bragging about it (see ElasticSearch native API). Unless you need to support thousands of connections, however, blocking I/O with thread pools works just as well (can get you excellent throughput and latency) and does not end up being a callback soup. But it's not fancy so you won't read about it on Hacker News, I know.
Gravatar
# re: Clojure has a Problem with Async
Posted by Feng Shen on 3/10/2013 3:26 AM
Async is hard, very hard to get right, when used in application(not just a library), it get even harder, application change a lot, and very quickly.

My idea is async in the key(very few) part, but sync in the most part.

When developing http-kit, I and Peter spent a plenty of time thinking about the async API to export, even though I just had 2 API to export: with-channel (for server), request (for client). Even though I am the author of http-kit, in my own applications, I prefer to use the blocking API, it's fast, and easy. I only use the async API when there is no choice. By the way, I am a web developer, I write server side and client side javascript code.

Blocking is bad, since it prevent you from doing anything but wait. It hurts performance. But when used with threads, the performance will not be an issue anymore. And you have a much better API.

For database acces, I think the blocking jdbc is good, I also write a connection pool for Clojure: https://github.com/http-kit/dbcp.clj. It will give you the performance you need, probably faster than you think, even faster than node.js's async one (not tested).

Feel free to tell me how do you think, I am open for discussion.
Something to add?

Talking sense? Talking rubbish? Something I'm missing? Let me know!

Fields denoted with a "*" are required.

 (will not be displayed)

 
Please add 4 and 7 and type the answer here:

Preview Your Comment