Friday, August 2, 2013

Build Your Own Splunk-Like Central Log Management Tool With Open Source Software

Build Your Own Splunk-Like Central Log Management Tool With Open Source Software




In the age of big data, log management is becoming an absolute necessity, as developers, operations, and, yes, DevOps have to deal with and process huge amounts of machine-generated data. Many organizations have turned to Splunk, a pioneer in the space, to help manage the rising tide of log data – but Splunk can get really, really expensive.

There’s still not a single general-purpose alternative to Splunk. But over the weekend,  Booking.com SysAdmin Brad Lhotsky documented his quest to build his own central log management system using only open source software.
Of course, his blog entry contains much deeper technical insight, but at the high level, he broke his solution down into three components: Log centralization (rsyslog), log management (logstash/Kibana) and log visualization (Graphite).
Rsyslog was tapped for log centralization over similarly popular alternative syslog-ng because the former offers guaranteed delivery and encrypted transfer in the open source edition – two features that Lhotsky says are becoming of increased importance to regulatory compliance auditors. With rsyslog, Lhotsky was able to build a reliable way to transport event logs from Unix hosts to a central repository.
This is where Lhotsky starts entering Splunk’s territory, calling the company “the 1,000 lb Gorilla in the room.” But in lieu of Splunk, Lhotsky writes that he took the MongoDB-powered Graylog2 for a test drive before settling on logstash. Graylog2 is great, he says, but suggests that its ElasticSearch indexing scheme is “broken,” and if you have to keep a large amount of logs around for compliance reasons, you’re going to take a performance hit. Lhotsky goes so far as to speculate that it’s because Graylog2 only implemented ElasticSearch for, well, search fairly late in the game.
On the other side of the coin, logstash also uses ElasticSearch, but with far more of a focus on scalability, inputs, filter and outputs. The cost, Lhotsky writes, is a polished front-end. Enter Kibana, a PHP front-end for logstash that takes the ElasticSearch indexes and adds a front-end for search and analysis, making the whole platform a lot more usable.
“Kibana fills the gap with the Logstash interface so perfectly. It doesn’t give me everything I’d get with Splunk, but I’ve just touched the functionality I can extract with Logstash,” as Lhotsky puts it.
Finally, he suggests the popular Graphite for data visualization and graphing all the log data you’ve now collected.
As Lhotsky says, this just how he tried to match Splunk-like functionality with open source tools, and it’s still a work in progress. He’s on Twitter if you want to talk to him about this implementation directly, or else leave a comment below.
Regardless, there’s a definite and growing need for log management tools, and I’m wondering why Splunk is still relatively unchallenged in the space.


Lời bàn:

Tôi biết đến Splunk lần đầu tiên trong bài của anh Thái ( Thaidn ) về sử dụng Splunk trong pre-analysis access.log trong việc chống DDoS thụ động. Sau khi thử nghiệm Splunk xong thì đúng là đây là công cụ quản trị, phân tích, tổng hợp, trình diễn log và big data ( structured ) tốt nhất tôi từng thấy. Đó là cái lý do chính mà Splunk được bán với giá 5000$ ( mức cơ bản ) cho bản Enterprise

Do cái giá đắt đỏ của nó mà tôi luôn tìm kiếm giải pháp thay thế, thật không may Splunk đắt là vì nó tốt và hiện là duy nhất :D

Bài viết trên đây là một trong số những cách phối hợp các công cụ opensource để đạt được một số  main features mà Splunk cung cấp