In the previous part of this series, we learnt how we could store log data so it’s easy to get insights from it later. For instance, we use the GELF log format on Zolodeck, our side project. In this part, we’ll look at how to actually get insights from our logs, using a bunch of open source tools.
Here’re the simple 3-steps to get insights from our logs:
1) Write Logs
2) Transport Logs
3) Process Logs
In part 3, we saw how logging in a standard JSON format is beneficial. Some of my readers asked me why not use Clojure or Ruby data structures instead of JSON. Here’s why it’s better to use JSON format:
- JSON is accessible from all languages
- There are already a bunch of tools available to transport and process logs that accept JSON
Always write logs to local disk. It is tempting to have a log4j appender that directly sends logs to a remote server using UDP or HTTP. Unfortunately, you can’t guarantee that you’ll consistently reach those servers. So it’s better to write to local disk first, and then transport your logs to wherever your logs are going to be processed. There are many open source tools available for transporting logs, and depending on what tool you end up using to process your logs, your choice for transporting logs will change. Some tools that you can use for transporting logs are:
- Scribe – Facebook open-sourced this. This tool does more than just transporting logs.
- Logstash – This tool does a lot more than transporting logs.
We need to be able to collect, index, and search through our log data for anything we care to find. Some open-source tools out there for processing logs are:
Both Logstash (Kibana) and Graylog2 provides web interfaces to make life easy when analyzing and searching the underlying logs.
As you can see there are many options for managing and analyzing logs. Here’s what we currently do in our project:
It is simple for now and we’re hoping to keep it that way
Logs are useful. Logs that provide us with insights are more useful :), and if they do this easily, they’re even more so. When we start a new project, we need to spend some time thinking about logging up front, since this is crucial part of managing growth. Thanks to a variety of open-source tools and libraries, it is not that expensive to try out different logging strategies. A properly thought-out logging architecture will save you a lot of time later on. I hope this series has shed some light on logging and why it’s important in this age of distributed computing.
Please do share your experiences, how’re handling logging on your projects. What went well? What didn’t? I’d love to see what folks are doing out there, and document them here, to make this knowledge available for others. Onward!