Logging with Logstash

In these days I started looking up for an opensource, easy-to-use central logging system. I read a lot of articles on the internet, and I found two good solution, Graylog2 and LogStash. Both system can operate as syslog server and store our logs in database. I tried both system, and I experienced both have small problems - but some of these problems are different.

Graylog2

Graylog has a very good-looking interface and a relative easy configuration file. It has a lot of parameters, but the default configuraiton file is well-documented and easy to read.

Graylog uses a MongoDB for store some small stuffs and ElasticSearch for store and search logs. The main advantage of this solution is the separated web interface. It can be installed anywhere, it just need a some small configurations to connect to same ES/Mongo instances as Graylog server uses.

The web ui makes easy to search and filter logs, backlist some entries, and create a “channels” what is a named filter setting, where you can follow what happening in a simple context.

But as I started use web interface and the server itself in everyday tasks, I found two problems:

  • Web interface is does not make easy to write filters, because nor the interface, nor the documentation not describe how filters will be matched against messages. Because these filters are simple regular expressions, it is needed to know how to search some specific things. I sucked with it a lot of time until I found a method to find something. It is need a lot of improvement.
  • Graylog seems does not take care about its input. After a week I found logging is stopped working. After some investigation, I discovered the ElasticSearch eaten a lot of resources and generating a huge log. After reading log I found the ElasticSearch is broken on some invalid characters in index files. I tried to restart ES server and leave to recover itself for a hour but no luck, it stuck in recovering state. Because I am not know well ES and graylog I started ask Google about these problems, but I did not find a good solution for me. So I decided to move on from Graylog.

LogStash

LogStash is a more robust logging solution. It is similar as syslog-ng but it does not restrict itself to working as Syslog server (with other words: get infos from input like syslog server) but it can chew anything what is a text stuff. It can use a lot of things as a log source, e.g.:

  • Syslog server (it can act as syslog server, but see below)
  • Plain file
  • Raw TCP/UDP socket
  • Output of any executable
  • IRC/Jabber channels
  • Twitter
  • Stdin
  • Windows Eventlog
  • and so on…

And it has a numerous output format too. What is important for me is it can use ElasticSearch and some No-SQL databases as log-storing backend.

It has a built-in web interface for searching in log records, but Kibana is a little better Web interface to search against ElasticSearch, it makes searching/filtering easier.

My current configuration is very easy:

  • There is a central syslog server, provided by LogStash
  • All my server is logging here via RSyslog (default syslog server of Debian Linux)
  • LogStash is configured to store log messages in ElasticSearch after some parsing

Basically, I followed the official recipe for this setup except one thing: somehow I couldn’t get working the date parser in the way what recipe says. So I tricked it out.

First, I created an Rsyslog config part, what I uploaded to /etc/rsyslog.d on every server:

/etc/rsyslog.d/to-logstash.conf
1
2
3
$template LOGSTASH,"<%PRI%>%timegenerated:::date-rfc3164-buggyday% %HOSTNAME% %APP-NAME%: %msg:::drop-last-lf%\n"
$ActionForwardDefaultTemplate LOGSTASH
*.*   @1.2.3.4:514

Note the %timegenerated:::date-rfc3164-buggyday% directive. The documentation of Rsyslog says it is basically pads a day number with zero if it is smaller than 10. It is introduced to emulate a syslog-ng ‘bug’.

After it I created a following configuration for logstash:

LogStash configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
input {
  tcp {
    port => 514
    type => syslog
  }

  udp {
    port => 514
    type => syslog
  }
}

filter {
  grok {
      type => "syslog"
      pattern => [ "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{PROG:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" ]
      add_field => [ "received_at", "[email protected]}" ]
      add_field => [ "received_from", "[email protected]_host}" ]
  }
  syslog_pri {
      type => "syslog"
  }

  date {
      type => "syslog"
  #    syslog_timestamp => [ "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
      syslog_timestamp => [ "MMM dd HH:mm:ss" ]
  }

  mutate {
      type => "syslog"
      exclude_tags => "_grokparsefailure"
      replace => [ "@source_host", "%{syslog_hostname}" ]
      replace => [ "@message", "%{syslog_message}" ]
  }
  mutate {
      type => "syslog"
      remove => [ "syslog_hostname", "syslog_message", "syslog_timestamp" ]
  }
}


output {
  elasticsearch {
    index => "syslog-%{+YYYY.MM.dd}"
    embedded => false
  }

  #stdout { debug => true debug_format => json }
}

# vim: ts=2 sw=2 et

Note the commented out date mapping. This is what original recipe says but I always got an exception related to parser and DateFormat what complaining about invalid data in the date. With buggyday hack on Rsyslog side I made dates 2-digit long, so it will always be parsed, I hope. It will reveals at least Nov 1.

The last output stuff is just for debugging, it displays the parsed object in JSON format, this is a way how can you check your all filter working correctly.

I sucked with parsing the message because my old rsyslog template has a typo and I could not imagine, what failing. After some digging, I found jls-grok RubyGem is contains a same parser what LogStash uses, so you can check your filters via IRB. Just make sure you using 1.9.3 ruby, because Ruby 1.8 throws a SyntaxError when you try require 'grok-pure' (what is recommended, because the simple ‘grok’ is searching a C lib).

Simple example for testing Grok filter
1
2
3
4
5
6
7
8
9
10
11
require 'rubygems'
require 'grok-pure'

grok = Grok.new

# I extracted patterns dir from jar package
Dir.glob('patterns/*').each { |f| grok.add_patterns_from_file f }
s="Oct 27 19:01:09"
pattern = '%{SYSLOGTIMESTAMP:syslog_timestamp}'
grok.compile(pattern)
p grok.match(s).match.to_a # => ["Oct 27 19:01:09", "Oct 27 19:01:09", "Oct", "27", "19:01:09", "19", "01", "09"]

You can use exactly same pattern as you use in LogStash configuration file.

Note: If you see a _grokparsefailure tag in your tag list, then you do something wrong. This means the filter what you using in first filter does not match correctly on your syslog message from Rsyslog. The good news is LogStash stop evaluating filters on the first failing in debug mode, and leave @message untouched (if you see the second mutate from end, you can see we replace @message with the content of syslog_message, and remove unneccessary fields in the last mutate filter). If somehow it falls through the grok filter and removes the original (raw) message, then simply comment out the last two filter by prefixing lines with a hashmark.

As you see in the config use an external ES server, not the embedded one, because I currently have a working and configured ES instance, and I would like to use that. No other reason to not use embedded ES server: if you currently not have an ES instance, then simply use what shipped with LogStash.

The embedded web interface of LogStash is a bit tricky thing, because I could not start as documentation says via the java command, I had to extract the jar file and run standalone to check its user interface. Don’t expect something cool: It is just a textfield with a button and contains a link with a basic query string what filters messages generated today.

Conclusion

I think Graylog is a great logging solution, but it needs some love for prevent crashing ES server with invalid characters in the stored data. But the web interface and easy configuration makes it very, really very lovely.

LogStash is a good solution if you want to handle multiple log sources or you want to validate/manipulate your log messages or you want to distribute logs to multiple destinations. I think LogStash is a little overkill if you just want a central syslog server, however - this is working as expected. So I took my two cent on LogStash and not on Graylog - even if I would like to prefer Graylog.

Comments