Running a Clojure function periodically

I was working on collecting stats about our API servers, and needed to connect them to some kind of a visualization system. The idea, of course, is that measurement drives all future optimizations and improvements, so we need to be able to quickly see what was going on in our processes. 

We settled on using clojure-metrics to do the actual data collection from within our code, and then sending it all to the excellent Librato service for monitoring. 

One thing I wanted was to send a snapshot of all collected metrics every 30 seconds. For this, I had a function called report-all-metrics that I essentially needed to run every 30 seconds. It would collect everything from the metrics registry, and then connect to the Librato API, and send everything over. It would be trivial to write this in a custom way in Clojure, by wrapping it in another function that recursively calls itself after sleeping for the desired duration.

However, I figured I’d wrap ScheduledThreadPoolExecutor from the java.util.concurrent package and get the benefits of the runtime managing this for me instead. I ended up with a function called run-thunk-periodically which does essentially what I described earlier. Here’s the code:

Here it is in action:

And the output looks like this:

The idea is that while it works as expected, when there is an exception thrown, it tells you what is going on in the logs. Also, the thread-pool name is set appropriately, so you can identify the threads in a profiler.

Hope this is useful to someone!

Six year founder vesting

I’ve always wanted to live in Silicon Valley. I grew up in India, in Bangalore, and we were always proud of the fact that we were the “Silicon Valley of India“. When I was young, I never really understood what that meant, but I knew it was something cool. My dad was also in the tech industry (embedded systems and avionics mostly) and so I grew up in a home full of conversations about science, engineering, and math. Still, Silicon Valley always seemed a world away, and I was happy just living in the Indian version of it.

Before working at my first startup here in the valley, I always regarded startups with some awe… I admired the innovation, the drive, the speed, and the fact that people took the risk and plunged in. And since we mostly only read about the successes, it was all very exciting. I was eager for my own adventure for as long as I can remember… Of course, within a few months of living and working here, I realized that my ideas were incomplete. There was still all the glitzy sexiness, but there was also the other 95% of the story – where startups fail for all manner of reasons. A lot of the times, the reason is plain poor execution.

But the biggest lesson of all was that it takes persistence to succeed. That there are no overnight successes, and the ones you hear about take years in the making. That the number one reason a startup fails is that the founders give up (the number two reason is that they run out of money, and then give up). The fact that most successes take a long time makes sense to me now – anything of value takes time to figure out, and anything truly ambitious needs multiple rounds of failures to learn from.

When we started Zolo Labs, we knew we wanted to build something valuable. Not just monetarily, but also something that moved the state of the art forward – if not for the world at large, at least in the space of productivity and business applications. We’re building software to support the future of work – that’s a grand thing to say, even if it is unclear what it might actually mean :-) But seriously, we do have a rather grand vision, and we’re fully aware that to realize even a portion of it, it’s going to take a fair bit of time. We’re believers in the lean way of doing things, so hopefully we’ll learn, iterate, and build our way towards our goals.

Who knows what meandering paths we’ll take on this journey, but we want to give ourselves time to get there. We’ve structured our own vesting accordingly – both Siva and I have set up our founder stock to vest over 6 years (instead of the standard 4), and have a 2 year cliff (instead of the standard 1 year). We believe we’re in it for the long haul, and want to put our (potential) money where our mouth is. This is unusual – almost no one I know is doing this – but there is at least one precedent that we know if, over at AngelList.

We’re also going to want to fill our company with folks that believe in our vision and want to help us make it happen for its own sake. We’re thinking hard about how we’re going to structure our ESOPs and think it’s likely to look very similar to our own. Obviously, we’ll happily give out more equity in the process, and pay people enough cash so they can work with us (rather than pay people to work with us).

Since this is our first go as founders, I figured we’d share our thoughts as we build out the company, and this is part of that series. Please drop us a note if you know others who’re using non-standard vesting plans, would love to hear about them.

P. S. We’re still pre-funded, so we’re not hiring yet, but we’re receiving inquiries about internships and the like – so if you’re interested in helping the cause while working on Clojure, Datomic, and Storm, drop us a line at @amitrathore or @sivajag!

From Runa to Zolo Labs in 54 months

Runa

Runa Logo

I’ve been at Runa for about four and a half years: I was there from pretty much from the beginning, and for the past couple of years, I ran the tech team as the CTO and VP of Engineering. We’ve built an amazing product, with quite a small team of engineers. We picked tools that gave us leverage, and I think to a large part, these choices let us stay small and lean.

We were among the first companies in the world to push a Clojure service into production… it was the fall of 2008, and Clojure was still pre-version-1.0. In those days, it still used Subversion for version control… and I remember having to tag commit #783 or something… so we’d be sure some unknown incompatibility wouldn’t derail us. We finally did an official release of that in Jan of 2009. Fun times :-)

Since then, we’ve grown our code-base to tens of thousands of lines of Clojure code, and have worked with HBase + Hive + MapReduce, Cascalog, RabbitMQ, Redis, MonetDB, Hazelcast, and loads of AWS services. We coded several state of the art machine-learning algorithms for predictive modeling. Rapunzel, our Clojure DSL, is used by several non-technical folks every day to support our merchants. It’s really neat stuff.

And the code is blazingly fast. For instance, we’re one of the only 3rd party services deployed inside eBay’s data-centers, and are called in real-time for their live pages. To be specific, we’re called about 100 to 200 millions times a day, with spikes of over a billion, and have an average response times of 640 micro-seconds. This service makes statistical predictions about [redacted], and runs over 800,000 models concurrently. Like I said – fun times…

More than anything, I worked with some incredibly talented people over these past few years… not just here in the Bay Area, but also in Bangalore, India. I think the work we pulled together as a team, and the relationships I’ve made with these people are what I’m most proud of…

Zolo Labs
z_logo_white
This past month, though, was my last as Runa’s CTO. I’m still engaged as the chief technologist, but no I’m longer a full-time employee. Why did I leave? Well… it’s been a long time, and I’ve been itching to do this Zolodeck thing for a while. And I see good things in the future for Runa, and I’m certain these things will happen without me as well… the team is rock solid… which gives me confidence to start my own adventure!

Some of you know what we’re building – our first product is called Zolodeck – but there’s a deeper philosophy behind that choice. I’ll delve into that in another post soon, but for now I’ll just say we want to improve how people collaborate and converse with each other, whether with their friends and family, or when they’re at work.

And for the techie readers of this blog, yes, we’re using a lot of cutting-edge technology to build out this product including Clojure, Datomic, Storm, and ClojureScript. Here’s a link to a talk I did at Strange Loop last year, discussing some elements of our stack.

It’s an exciting time for me… I’ve been waiting to be an entrepreneur for nearly 12 years… and it’s finally here! Boy, do I feel anxious and excited. Anxcited?

To write tests or to not write tests…

At the Bay Area Clojure User Group this evening, a familiar discussion came up. Yea, about unit-testing in Clojure. And someone said, “But Rich Hickey talks of TDD being guard-rails-driven programming… and who wants to drive their car banging into the guard rails?”

Implying that TDD isn’t particularly useful or required in a language like Clojure. 

So here’s what I said then, and here’s what I say now (and yes, this is the official Zolo Labs stand on this): if you’re Rich Hickey, you don’t need TDD or unit-testing. Otherwise, STFU and write tests.

OK, I didn’t actually say STFU.

Why Datomic?

Many of you know we’re using Datomic for all our storage needs for Zolodeck. It’s an extremely new database (not even version 1.0 yet), and is not open-source. So why would we want to base our startup on something like it, especially when we have to pay for it? I’ve been asked this question a number of  times, so I figured I’d blog about my reasons:

  • I’m an unabashed fan of Clojure and Rich Hickey
  • I’ve always believed that databases (and the insane number of optimization options) could be simpler
  • We get basically unlimited read scalability (by upping read throughput in Amazon DynamoDB)
  • Automatic built-in caching (no more code to use memcached (makes DB effectively local))
  • Datalog-as-query language (declarative logic programming (and no explicit joins))
  • Datalog is extensible through user-defined functions
  • Full-text search (via Lucene) is built right in
  • Query engine on client-side, so no danger from long-running or computation-heavy queries
  • Immutable data – audits all versions everything automatically
  • “As of” queries and “time-window” queries are possible
  • Minimal schema (think RDF triples (except Datomic tuples also include the notion of time)
  • Supports cardinality out of the box (has-many or has-one)
  • These reference relationships are bi-directional, so you can traverse the relationship graph in either direction
  • Transactions are first-class (can be queried or “subscribed to” (for db-event-driven designs))
  • Transactions can be annotated (with custom meta-data) 
  • Elastic 
  • Write scaling without sharding (hundreds of thousands of facts (tuples) per second)
  • Supports “speculative” transactions that don’t actually persist to datastore
  • Out of the box support for in-memory version (great for unit-testing)
  • All this, and not even v1.0
  • It’s a particularly good fit with Clojure (and with Storm)

This is a long list, but perhaps begins to explain why Datomic is such an amazing step forward. Ping me with questions if you have ‘em! And as far as the last point goes, I’ve talked about our technology choices and how they fit in with each other at the Strange Loop conference last year. Here’s a video of that talk.

Logging in Clojure / JVM – Part 4

In the previous part of this series, we learnt how we could store log data so it’s easy to get insights from it later. For instance, we use the GELF log format on Zolodeck, our side project. In this part, we’ll look at how to actually get insights from our logs, using a bunch of open source tools.

Here’re the simple 3-steps to get insights from our logs:

1) Write Logs
2) Transport Logs
3) Process Logs

Simple enough!

Write Logs:

In part 3, we saw how logging in a standard JSON format is beneficial. Some of my readers asked me why not use Clojure or Ruby data structures instead of JSON. Here’s why it’s better to use JSON format:

  • JSON is accessible from all languages
  • There are already a bunch of tools available to transport and process logs that accept JSON

Transport Logs:

Always write logs to local disk. It is tempting to have a log4j appender that directly sends logs to a remote server using UDP or HTTP. Unfortunately, you can’t guarantee that you’ll consistently reach those servers. So it’s better to write to local disk first, and then transport your logs to wherever your logs are going to be processed. There are many open source tools available for transporting logs, and depending on what tool you end up using to process your logs, your choice for transporting logs will change. Some tools that you can use for transporting logs are:

  • Scribe – Facebook open-sourced this. This tool does more than just transporting logs.
  • Lumberjack
  • rsync
  • Logstash – This tool does a lot more than transporting logs.

Process Logs:

We need to be able to collect, index, and search through our log data for anything we care to find. Some open-source tools out there for processing logs are:

  • Logstash – Like I said, this tool does lot more than transporting logs :)
  • Graylog2

Both Logstash (Kibana) and Graylog2 provides web interfaces to make life easy when analyzing and searching the underlying logs.

As you can see there are many options for managing and analyzing logs. Here’s what we currently do in our project:

Logging in Clojure 4.001

It is simple for now and we’re hoping to keep it that way :)

Conclusion

Logs are useful. Logs that provide us with insights are more useful :), and if they do this easily, they’re even more so. When we start a new project, we need to spend some time thinking about logging up front, since this is crucial part of managing growth. Thanks to a variety of open-source tools and libraries, it is not that expensive to try out different logging strategies. A properly thought-out logging architecture will save you a lot of time later on. I hope this series has shed some light on logging and why it’s important in this age of distributed computing.

Please do share your experiences, how’re handling logging on your projects. What went well? What didn’t? I’d love to see what folks are doing out there, and document them here, to make this knowledge available for others. Onward!

Pretty-printing in Clojure logs

Logging is an obvious requirement when it comes to being able to debug non-trivial systems. We’ve been thinking a lot about logging, thanks to the large-scale, distributed nature of the Zolodeck architecture. Unfortunately, when logging larger Clojure data-structures, I often find some kinds of log statements a bit hard to decipher. For instance, consider a map m that looked like this:

When you log things like m (shown here with println for simplicity), you may end up needing to understand this:

Aaugh, look at that second line! Where does the data-structure begin and end? What is nested, and what’s top-level? And this problem gets progressively worse as the size and nested-ness of such data-structures grow. I wrote this following function to help alleviate some of the pain:

Remember to include clojure.pprint. And here’s how you use it:

That’s it, really. Not a big deal, not a particularly clever function. But it’s much better to see this structured and formatted log statement when you’re poring over log files in the middle of the night.

Just note that you want to use this sparingly. I first modified things to make ALL log statements automatically wrap everything being logged with pp-str: it immediately halved the performance of everything. pp-str isn’t cheap (actually, pprint isn’t cheap). So use with caution, where you really need it!

Now go sign-up for Zolodeck!

Logging in Clojure / JVM – Part 3

In part 1 and 2, we looked at the history of logging and how to use SLF4J (a library I’m using with Zolodeck). In this part, we’re going to learn about different formats that we can use to log things in. We need to choose a correct log format so we can then get insights from our logs when we want them. If we cannot easily process the logs to get at the insights we need, then it doesn’t matter what logging framework we use and how many gigabytes of logs we collect everyday.

Purpose of logs:

  • Debugging issues
  • Historic analysis
  • Business and Operational Analysis

If the logs are not adequate for these purposes, it means we’re doing something wrong. Unfortunately, I’ve seen this happen on many projects.

Consumer of logs:

Before we look into what format to log in, we need to know who is going to consume our logs for insights. Most logging implementations assume that humans will be consuming log statements. So they’re essentially formatted string (think printf) so that humans can easily read them. In these situations,  what we’re really doing is creating too much log data for humans to consume to get any particularly useful insights. People then try to solve this overload problem by being cautious about what they log, the idea being that lesser information will be easier for humans to handle. Unfortunately, we can’t know beforehand what information we may need to debug an issue. What always ends up happening is that some important piece of information is missed out.

Remember how we can program machines to consume lots of data and provide better insights? So instead of creating log files for humans to consume, we need to create them for machines.

Format of logs:

Now that we know that machines are will be consuming our logs, we need to make a decision what format our logs should be in. Optimizing for machine readability makes sense, of course.

Formatted strings:

We can easily write a program using regex to parse our formatted strings log messages. Formatted strings, however, are still not a good fit, because of these following reasons:

  • Logging Java stack traces can break our parser thanks to new line characters
  • Developers can’t remove or add fields without breaking the parser.

What is a better way then?

JSON Objects:

JSON objects aren’t particularly human readable, but machines love them. We can use any JSON library to parse our logs. Developers can add and remove fields, our parser will still work fine. We can also log Java stacktraces without breaking our parser by just treating it as a field of data.

JSON log object fields:

Now that it makes sense to use JSON objects as logs, the question is  what basic fields ought to be included. Obviously, this will depend on the application and business requirements. But at a minimum, we’d need the following fields:

  • Host
  • Message
  • Timestamp
  • Log Level
  • Module / Facility
  • File
  • Line Number
  • Trace ID
  • Environment

Standard JSON log format:

Instead of coming up with a custom JSON log format, we ought to just use a standard JSON log format. One option is to use GELF, which is used by many log analysis tools. GELF stands for Graylog Extended Log Format. There are lot of open source log appenders that create logs in GELF format. I’m using it on my side project Zolodeck, and we use logback-gelf.

In this part of this blog series, we learnt why we need to think about machine readable logs, and why we ought to use JSON as the log format. In the next part, we will look at how to get insights from logs, using a bunch of open source tools.