Startup logbook – v0.2 – distributed Clojure system in production

This past weekend, we pushed another major release into production. We’ve been working on several things and have made a few pushes since the last time I wrote about this – but this release has a bunch of interesting Clojure related stuff.

Long-running processes

The main thing of note is that the majority of our back-end is now written in Clojure. You might recall that our online-merchant customers send us a lot of data, and we run a ton of analytics on that data. Our initial plans involved Ruby, but as we started using Clojure, it turned out that it is very well suited for this job as well (long running, message-driven processes that crunch numbers).

The raw data sits in HBase, and every night a “master” process starts up which kicks-off the processing of the previous day’s worth of data. The job of this master is only to coordinate the work (it doesn’t actually do any real work), it does this by breaking work into chunks and dispatching messages that each assign work to any worker process that picks it up. The master is single threaded for simplicity, but failure tolerant – it checkpoints everything in a local MySQL database, and if it crashes, it is automatically re-spawned and it recovers from where it left off.

clojure-in-production-v0.2.png

An elastic cloud of worker processes run in anticipation of the master handing out this work. The worker processes use the MySQL database to keep track of their progress as well. The rest is rather domain-specific. We use intermediate representations of the raw data, which is also stored in HBase, before finally storing the summarized version again in HBase.

Swarmiji

We use an in-house distributed-programming framework called Swarmiji to make such distributed programs very easy to write and run. Swarmiji implements a flavor of staged event-driven architecture (SEDA) to allow server processes that exhibit scalable, predictable throughput. This is especially true in the face of over-load, which we can certainly expect in our environment.

The reason I wrote this framework was that I wanted to create distributed, parallel programs which exploited large numbers of machines (like in a data-center) – without being limited by clojure’s in-JVM-threads-based model. So each worker process in Swarmiji gets deployed as a shared-nothing JVM process.

I will write up a post introducing Swarmiji in the next few weeks – once its a bit more battle-tested, and I’ve added a few more features (mainly around process management).

3 thoughts on “Startup logbook – v0.2 – distributed Clojure system in production

  1. curious: why the full jvm process vs. e.g. clojure-based concurrency? are you really getting the most efficient use of the hardware sevaks run on?

  2. Probably not, who knows?! I have yet to profile the system.

    Meanwhile, each sevak server process uses Clojure’s built-in agent system to handle requests. So it does utilize what I assume you meant when you said clojure-based concurrency. However, by just using agents alone, you can only scale by scaling your jvm (say by using terracotta). Swarmiji allows you to also scale across machines – while giving you resiliency because the failure of one machine doesn’t cause complete failure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s