Simple Python Syslog Counter
Recently I did a Packet Pushers episode about log management. In it, I mentioned some of the custom Python scripts that I run to do basic syslog analysis, and someone asked about them in the comments.
The script I'm presenting here isn't one of the actual ones that I run in production, but it's close. The real one sends emails, does DNS lookups, keeps a "rare messages" database using sqlite3, and a few other things, but I wanted to keep this simple.
One of the problems I see with getting started with log analysis is that people tend to approach it like a typical vendor RFP project: list some requirements, survey the market, evaluate and buy a product to fit your requirements. Sounds good, right? The problem with log analysis is that often you don't know what your requirements really are until you start looking at data.
A simple message counting script like this lets you look at your data, and provides a simple platform on which you can start to iterate to find your specific needs. It also lets us look at some cool Python features.
I don't recommend pushing this too far: once you have a decent idea of what your data looks like and what you want to do with it, set up Logstash, Graylog2, or a similar commercial product like Splunk (if you can afford it).
That said, here's the Python:
I tried to make this as self-documenting as possible. You run it from the CLI with a syslog file as the argument, and you get this:
$ python simple_syslog_count.py sample.txt
214 SEC-6-IPACCESSLOGP
15 SEC-6-IPACCESSLOGRL
10 LINEPROTO-5-UPDOWN
10 LINK-3-UPDOWN
7 USER-3-SYSTEM_MSG
4 STACKMGR-4-STACK_LINK_CHANGE
4 DUAL-5-NBRCHANGE
3 IPPHONE-6-UNREGISTER_NORMAL
3 CRYPTO-4-PKT_REPLAY_ERR
3 SEC-6-IPACCESSLOGRP
3 SEC-6-IPACCESSLOGSP
2 SSH-5-SSH2_USERAUTH
2 SSH-5-SSH2_SESSION
2 SSH-5-SSH2_CLOSE
10.1.16.12
6 SEC-6-IPACCESSLOGP
10.1.24.3
2 LINEPROTO-5-UPDOWN
2 LINK-3-UPDOWN
[Stuff deleted for brevity]
For Pythonistas, the script makes use of a few cool language features:
per_reporter_counts[reporter][msg] += 1
Here, the dictionary per_reporter_counts has the IPv4 addresses of the syslog reporters as keys, with a Counter object as the value holding the counts for each message type:
>>> from collections import Counter,defaultdict
>>> per_reporter_counts = defaultdict(Counter)
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 1
>>> per_reporter_counts
defaultdict(, {'1.1.1.1': Counter({'SOME-5-MESSAGE': 1})})
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 5
>>> per_reporter_counts
defaultdict(, {'1.1.1.1': Counter({'SOME-5-MESSAGE': 6})})
If you got this far, you can go implement it for IPv6 addresses. :-)
The script I'm presenting here isn't one of the actual ones that I run in production, but it's close. The real one sends emails, does DNS lookups, keeps a "rare messages" database using sqlite3, and a few other things, but I wanted to keep this simple.
One of the problems I see with getting started with log analysis is that people tend to approach it like a typical vendor RFP project: list some requirements, survey the market, evaluate and buy a product to fit your requirements. Sounds good, right? The problem with log analysis is that often you don't know what your requirements really are until you start looking at data.
A simple message counting script like this lets you look at your data, and provides a simple platform on which you can start to iterate to find your specific needs. It also lets us look at some cool Python features.
I don't recommend pushing this too far: once you have a decent idea of what your data looks like and what you want to do with it, set up Logstash, Graylog2, or a similar commercial product like Splunk (if you can afford it).
That said, here's the Python:
I tried to make this as self-documenting as possible. You run it from the CLI with a syslog file as the argument, and you get this:
$ python simple_syslog_count.py sample.txt
214 SEC-6-IPACCESSLOGP
15 SEC-6-IPACCESSLOGRL
10 LINEPROTO-5-UPDOWN
10 LINK-3-UPDOWN
7 USER-3-SYSTEM_MSG
4 STACKMGR-4-STACK_LINK_CHANGE
4 DUAL-5-NBRCHANGE
3 IPPHONE-6-UNREGISTER_NORMAL
3 CRYPTO-4-PKT_REPLAY_ERR
3 SEC-6-IPACCESSLOGRP
3 SEC-6-IPACCESSLOGSP
2 SSH-5-SSH2_USERAUTH
2 SSH-5-SSH2_SESSION
2 SSH-5-SSH2_CLOSE
10.1.16.12
6 SEC-6-IPACCESSLOGP
10.1.24.3
2 LINEPROTO-5-UPDOWN
2 LINK-3-UPDOWN
[Stuff deleted for brevity]
For Pythonistas, the script makes use of a few cool language features:
Named, Compiled rRgexes
- We can name a regex match with the (?P
PATTERN) syntax, which makes it easy to understand it when it's referenced later with the .group('') method on the match object. - This is demonstrated in lines 36-39 and 58-59 of the gist shown above.
- It would be more efficient to capture these fields by splitting the line with the .split() string method, but I wanted the script to work for unknown field positions -- hence the regex.
Multiplication of Strings
- We control indentation by multiplying the ' ' string (that a single space enclosed in quotes) by an integer value in the print_counter function (line 50).
- The reason this works is that the Python str class defines a special __mul__ method that controls how the * operator works for objects of that class:
>>> 'foo'.__mul__(3)
'foofoofoo'
>>> 'foo' * 3
'foofoofoo'
collections.Counter Objects
- Counter objects are a subclass of dictionaries that know how to count things. Jeremy Schulman talked about these in a comment on the previous post. Here, we use Counters to build both the overall message counts and the per-device message counts:
>>> my_msg = 'timestamp ip_address stuff %MY-4-MESSAGE:other stuff'
>>> CISCO_MSG = re.compile('%(?P.*?):')
>>> from collections import Counter
>>> test_counter = Counter()
>>> this_msg = re.search(CISCO_MSG,my_msg).group('msg')
>>> this_msg
'MY-4-MESSAGE'
>>> test_counter[this_msg] += 1
>>> test_counter
Counter({'MY-4-MESSAGE': 1})
collections.defaultdict Dictionaries
- It could get annoying when you're assigning dictionary values inside a loop, because you get errors when the key doesn't exist yet. This is a contrived example, but it illustrates the point:
>>> reporters = {}
>>> for reporter in ['1.1.1.1','2.2.2.2']:
... reporters[reporter].append['foo']
...
Traceback (most recent call last):
File "", line 2, in
KeyError: '1.1.1.1'
- To fix this, you can catch the exception:
>>> reporters = {}
>>> for reporter in ['1.1.1.1','2.2.2.2']:
... try:
... reporters[reporter].append['foo']
... reporters[reporter].append['bar']
... except KeyError:
... reporters[reporter] = ['foo']
... reporters[reporter].append('bar')
- As usual, though, Python has a more elegant way in the collections module: defaultdict
>>> from collections import defaultdictIn the syslog counter script, we use a collections.Counter object as the type for our defaultdict. This allows us to build a per-syslog-reporter dictionary that shows how many times each message appears for each reporter, while only looping through the input once (line 66):
>>> reporters = defaultdict(list)
>>> for reporter in ['1.1.1.1','2.2.2.2']:
... reporters[reporter].append('foo')
... reporters[reporter].append('bar')
>>> reporters
defaultdict(, {'1.1.1.1': ['foo', 'bar'], '2.2.2.2': ['foo', 'bar']})
per_reporter_counts[reporter][msg] += 1
Here, the dictionary per_reporter_counts has the IPv4 addresses of the syslog reporters as keys, with a Counter object as the value holding the counts for each message type:
>>> from collections import Counter,defaultdict
>>> per_reporter_counts = defaultdict(Counter)
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 1
>>> per_reporter_counts
defaultdict(
>>> per_reporter_counts['1.1.1.1']['SOME-5-MESSAGE'] += 5
>>> per_reporter_counts
defaultdict(
If you got this far, you can go implement it for IPv6 addresses. :-)