Logging in Clojure / JVM – Part 4

In the previous part of this series, we learnt how we could store log data so it’s easy to get insights from it later. For instance, we use the GELF log format on Zolodeck, our side project. In this part, we’ll look at how to actually get insights from our logs, using a bunch of open source tools.

Here’re the simple 3-steps to get insights from our logs:

1) Write Logs
2) Transport Logs
3) Process Logs

Simple enough!

Write Logs:

In part 3, we saw how logging in a standard JSON format is beneficial. Some of my readers asked me why not use Clojure or Ruby data structures instead of JSON. Here’s why it’s better to use JSON format:

  • JSON is accessible from all languages
  • There are already a bunch of tools available to transport and process logs that accept JSON

Transport Logs:

Always write logs to local disk. It is tempting to have a log4j appender that directly sends logs to a remote server using UDP or HTTP. Unfortunately, you can’t guarantee that you’ll consistently reach those servers. So it’s better to write to local disk first, and then transport your logs to wherever your logs are going to be processed. There are many open source tools available for transporting logs, and depending on what tool you end up using to process your logs, your choice for transporting logs will change. Some tools that you can use for transporting logs are:

  • Scribe – Facebook open-sourced this. This tool does more than just transporting logs.
  • Lumberjack
  • rsync
  • Logstash – This tool does a lot more than transporting logs.

Process Logs:

We need to be able to collect, index, and search through our log data for anything we care to find. Some open-source tools out there for processing logs are:

  • Logstash – Like I said, this tool does lot more than transporting logs :)
  • Graylog2

Both Logstash (Kibana) and Graylog2 provides web interfaces to make life easy when analyzing and searching the underlying logs.

As you can see there are many options for managing and analyzing logs. Here’s what we currently do in our project:

Logging in Clojure 4.001

It is simple for now and we’re hoping to keep it that way :)

Conclusion

Logs are useful. Logs that provide us with insights are more useful :), and if they do this easily, they’re even more so. When we start a new project, we need to spend some time thinking about logging up front, since this is crucial part of managing growth. Thanks to a variety of open-source tools and libraries, it is not that expensive to try out different logging strategies. A properly thought-out logging architecture will save you a lot of time later on. I hope this series has shed some light on logging and why it’s important in this age of distributed computing.

Please do share your experiences, how’re handling logging on your projects. What went well? What didn’t? I’d love to see what folks are doing out there, and document them here, to make this knowledge available for others. Onward!

Logging in Clojure / JVM – Part 3

In part 1 and 2, we looked at the history of logging and how to use SLF4J (a library I’m using with Zolodeck). In this part, we’re going to learn about different formats that we can use to log things in. We need to choose a correct log format so we can then get insights from our logs when we want them. If we cannot easily process the logs to get at the insights we need, then it doesn’t matter what logging framework we use and how many gigabytes of logs we collect everyday.

Purpose of logs:

  • Debugging issues
  • Historic analysis
  • Business and Operational Analysis

If the logs are not adequate for these purposes, it means we’re doing something wrong. Unfortunately, I’ve seen this happen on many projects.

Consumer of logs:

Before we look into what format to log in, we need to know who is going to consume our logs for insights. Most logging implementations assume that humans will be consuming log statements. So they’re essentially formatted string (think printf) so that humans can easily read them. In these situations,  what we’re really doing is creating too much log data for humans to consume to get any particularly useful insights. People then try to solve this overload problem by being cautious about what they log, the idea being that lesser information will be easier for humans to handle. Unfortunately, we can’t know beforehand what information we may need to debug an issue. What always ends up happening is that some important piece of information is missed out.

Remember how we can program machines to consume lots of data and provide better insights? So instead of creating log files for humans to consume, we need to create them for machines.

Format of logs:

Now that we know that machines are will be consuming our logs, we need to make a decision what format our logs should be in. Optimizing for machine readability makes sense, of course.

Formatted strings:

We can easily write a program using regex to parse our formatted strings log messages. Formatted strings, however, are still not a good fit, because of these following reasons:

  • Logging Java stack traces can break our parser thanks to new line characters
  • Developers can’t remove or add fields without breaking the parser.

What is a better way then?

JSON Objects:

JSON objects aren’t particularly human readable, but machines love them. We can use any JSON library to parse our logs. Developers can add and remove fields, our parser will still work fine. We can also log Java stacktraces without breaking our parser by just treating it as a field of data.

JSON log object fields:

Now that it makes sense to use JSON objects as logs, the question is  what basic fields ought to be included. Obviously, this will depend on the application and business requirements. But at a minimum, we’d need the following fields:

  • Host
  • Message
  • Timestamp
  • Log Level
  • Module / Facility
  • File
  • Line Number
  • Trace ID
  • Environment

Standard JSON log format:

Instead of coming up with a custom JSON log format, we ought to just use a standard JSON log format. One option is to use GELF, which is used by many log analysis tools. GELF stands for Graylog Extended Log Format. There are lot of open source log appenders that create logs in GELF format. I’m using it on my side project Zolodeck, and we use logback-gelf.

In this part of this blog series, we learnt why we need to think about machine readable logs, and why we ought to use JSON as the log format. In the next part, we will look at how to get insights from logs, using a bunch of open source tools.

Why Java programmers have an advantage when learning Clojure

There is a spectrum of productivity when it comes to programming languages. I don’t really care to argue how much more productive dynamic languages are… but for those who buy that premise and want to learn a hyper-productive language, Clojure is a good choice. And for someone who has a Java background, the choice Clojure becomes the best one. Here’s why:

  • Knowing Java – obviously useful: class-paths, class loaders, constructors, methods, static methods, standard libraries, jar files, etc. etc.
  • Understanding of the JVM – heap, garbage collection, perm-gen space, debugging, profiling, performance tuning, etc.
  • The Java library ecosystem – what logging framework to use? what web-server? database drivers? And on and on….
  • The Maven situation – sometimes you have to know what’s going on underneath lein
  • Understanding of how to structure large code-bases – Clojure codebases also grow
  • OO Analysis and Design – similar to figuring out what functions go where

I’m sure there’s a lot more here, and I’ll elaborate on a few of these in future blog posts.

I’ve not used Java itself in a fairly long time (we’re using Clojure for Zolodeck). Even so, I’m getting a bit tired of some folks looking down on Java devs, when I’ve seen so many Clojure programmers struggle from not understanding the Java landscape.

So, hey Java Devs! Given that there are so many good reasons to learn Clojure – it’s a modern LISP with a full macro system, it’s a functional programming language, it has concurrency semantics, it sits on the JVM and has access to all those libraries, it makes a lot of sense for you to look at it. And if you’re already looking at something more dynamic than Java itself (say Groovy, or JRuby, or something similar), why not just take that extra step to something truly amazing? Especially when you have such an incredible advantage (your knowledge of the Java ecosystem) on your side already?

Logging in Clojure / JVM – Part 2

In part 1 , we looked at the history of logging in Java. This time, we’ll learn more about SLF4J (Simple Logging Facade for Java).

From part 1, we saw that SLF4J is not a proxy to other logging frameworks, rather it is an API that allows end users to inject their desired logging framework at deployment time. SLF4J comes with adapters for many commonly used logging frameworks.

For my side project, I’m using SLF4J API. As many other projects, my project is dependent on many libraries. Unfortunately, not all libraries use SLF4J; and indeed, some of them use log4j API. You’ll be surprised to see how many newly written libraries use log4j (even though log4j is old and horrid). Even if they see the benefits to changing to SLF4J, it probably won’t happen soon. SLF4J comes with bridging modules for JCL, JUL and log4j to consolidate logging. These bridging modules redirect calls made to log4j, JCL and JUL to SLF4J instead. The image below explains the idea.

http://www.slf4j.org/images/legacy.png

[From http://www.slf4j.org/legacy.html]

Mapped Diagnostic Context

Another awesome feature of SLF4J is MDC ( Mapped Diagnostic Context). Even though it sounds very complicated, it is simple to use and yet so powerful. MDC is essentially a hash map maintained by the logging framework that can be inserted into log messages. Applications can update this hash map using SLF4J. Currently only log4j and logback offer MDC functionality. SLF4J will simply delegate to log4j or logback. If you use some other logging framework, then SLF4J will maintain the hash map, but you’ll need to write some custom code to retrieve the information from the map.

What’s the use of MDC?

One of the main goals of logging is to audit and debug complex real-world, distributed systems. These systems handle multiple clients simultaneously. So log messages are going to be interleaved. So, it’s very important to consolidate log messages of a single client or a single API call. The simplest way is to tag all log messages with client-info and trace-id (we’ll discuss more about this in the next part of this series). Without MDC, we’d need to put this information in every logging call. With MDC, all we have to do is setup the context (client-info , trace-id etc) and all our log messages will automatically have this information. This transforms our log messages into an amazing resource to learn about the system and its users.

Using MDC in Clojure:

Unfortunately, as of now, clojure.tools.logging does not support MDC. I think this a big hole in clojure.tools.logging, as Clojure is known for building complex systems. Luckily the Clojure community is vibrant, and there’s an open source project by Malcolm Sparks called clj-logging-config. The main purpose of this library is to programmatically setup logging config files. But there is one function, with-logging-context that allows us to setup MDC. Even though I’m not programmatically setting up logging config in my project right now, I am using this library just for this with-logging-context function. I strongly believe this library (or at least the with-logging-context function) should be part of clojure.tools.logging.

So there! In this part we learnt more about SLF4J and MDC. In the next part we will learn more about structured logging.

Calling java from Clojure

 

Importing Java Class

In repl, when you want to import one Java class you can do

(import 'java.util.Date)

When you want to import more Java classes from a same package you can do

(import [java.util Date HashMap])
(ns com.techbehindtech.java
        (:import [java.util Date HashMap]))

Creating Instances

(import 'java.util.Date)

(def today (new Date))

;; or

(def today (Date.))

Calling Java instance methods

user> (import 'java.util.Date)
java.util.Date
user> (let [today (Date.)]
        (.getTime today))
1286749020847

Calling Java static methods

user> (System/currentTimeMillis)
1286847946813

Sugar sytax:

Doto:

You want to write a function that will return UTC Java Calendar object set at a specific time.

user> (import [java.util Calendar TimeZone Date])
user> (defn utc-time [d]
        (let [cal (Calendar/getInstance)]
              (.setTimeZone cal (TimeZone/getTimeZone "UTC"))
              (.setTime cal d)
          cal))

#'user/utc-time

The let block is ugly. We could use doto to make this code better.

user> (import [java.util Calendar TimeZone Date])	
user> (defn utc-time [d]        
         (doto (Calendar/getInstance)
              (.setTimeZone (TimeZone/getTimeZone "UTC"))
              (.setTime d)))

#'user/utc-time
	

Dot Dot:

Sometimes in java you want to make calls in chain

Bad way

user> (.length (.getProperty (System/getProperties) "user.country"))
2
	

Better way

user> (. (. (System/getProperties) getProperty "user.country") length)
2

Even Better

user> (..
       (System/getProperties)
       (getProperty "user.country")
       (length))
2

Avoiding Reflection

By default jvm will be using reflection to identify type. Reflection is slow. But we can give type hints that way jvm does not have to use reflection. For example we can rewrite utc-time with type-hints like this

user> (set! *warn-on-reflection* true)
true
user> (defn str-length [s] (.length s))
Reflection warning, NO_SOURCE_FILE:1 - reference to field length can't be resolved.
#'user/str-length
user> (defn str-length [#^String s] (.length s))
#'user/str-length
	

Implementing interfaces and extending classes

Let us implement Java Runnable interface

user> (proxy [Runnable] []
        (run []
             (println "running ...")))
#<Object$Runnable$36fc6471 user.proxy$java.lang.Object$Runnable$36fc6471@6dd33544>

In clojure 1.2, you could use reify macro to implement. In fact it is better than using proxy.

user> (reify Runnable
             (run [this]
                  (println "running ...")))
#<user$eval1664$reify__1665 user$eval1664$reify__1665@574f7121>

Easy external DSLs for Java applications

Or JRuby for fun and profit

I’ve been developing software for some time now, and have recently found myself applying ideas from various esoteric areas of computer-science to every day tasks of building common-place applications (often these concepts are quite old, indeed some were thought of in 1958).

One powerful idea that I’ve been playing with recently is that of embedding a DSL (domain specific language) into your basic application framework – and writing most of the features of the application in that DSL.

This is simply an implementation of the concept of raising the level of abstraction. The point, of course, being that when writing code in a DSL implemented in such a fashion, one can express ideas in terms of high-level abstractions that represent actual concepts from the problem domain. In other words, it is like using a programming language that has primitives rooted in the domain.

A lot of people have been writing about this kind of software design – and most implement these ideas in a dynamic language of their choice. How does one go about doing the same in a language like Java? That is what this article is about. And I cheat in my answer. Consider the following design stack -

Creating DSLs in JRuby

I propose that you only implement basic and absolutely required pieces of functionality in Java – the things that rely on, say, external systems that expose a Java interface, or some EJB-type resource, or some other reason that requires the use of Java. The functionality you develop here is then exposed through an interface to the layers above. You can also add APIs for other support services you might need.

The layer above is a bunch of JRuby code that behaves as a facade to the Java API underneath. This leaves you with a Ruby API to that underlying Java (and other whatever, really!) stuff – and makes it possible to code against that functionality in pure Ruby. The JRuby interpreter runs as part of the deployable and simply executes all that Ruby code transparently. As far as your Ruby code is concerned, it doesn’t even care that some of the calls are internally implemented in Java. Sweet!

We can stop here. At this point, we are in a position to write the rest of our application in a nice dynamic language like Ruby. For some people, a nice fluent interface in Ruby suffices – and keeps all developers happy. This is depicted on the top-left part of the diagram.

However, we can go one step further, and implement a DSL in Ruby that raises the level of abstraction even more – as referred to earlier. So on top of the Ruby layer, you’d implement a set of classes that allow you to write code in simple domain-like terms, which then get translated into a form executable by the Ruby interpreter. This is shown in the top right part of the diagram. Ultimately, potentially any one (developers, QA or business analysts) could express their intent in something that looked very much like English.

So where to write what?

How much to put in your Java layer depends on the situation – some people (like me) prefer to write as little as possible in such a static language. Others like the static typing and the associated tool support, and prefer to put more code here. When merely shooting for a little bit of dynamism through the DSL engine in the layers above, most of the code could be written in Java, and a fluent API in the dynamic language could be enough. When shooting for rapid feature turn-around and a lot more flexibility, most of the code could be in the DSL or in the external dynamic language.

The answer to this question really depends on things like the requirements, team structure, skill-sets, and other such situational factors.

OK, so where’s the code?

My intention with this post was to stay at a high level – and talk of how one could structure an application to make it possible to embed a scripting engine into it, and to give an overview of the possibilities this creates. In subsequent posts, I will talk about how actual DSLs can be created, tested, and also how a team might be structured around it.