Simple Python Syslog Counter

Recently I did a Packet Pushers episode about log management. In it, I mentioned some of the custom Python scripts that I run to do basic syslog analysis, and someone asked about them in the comments.

The script I'm presenting here isn't one of the actual ones that I run in production, but it's close. The real one sends emails, does DNS lookups, keeps a "rare messages" database using sqlite3, and a few other things, but I wanted to keep this simple.

One of the problems I see with getting started with log analysis is that people tend to approach it like a typical vendor RFP project: list some requirements, survey the market, evaluate and buy a product to fit your requirements. Sounds good, right? The problem with log analysis is that often you don't know what your requirements really are until you start looking at data.

A simple message counting script like this lets you look at your data, and provides a simple platform on which you can start to iterate to find your specific needs. It also lets us look at some cool Python features.

I don't recommend pushing this too far: once you have a decent idea of what your data looks like and what you want to do with it, set up Logstash, Graylog2, or a similar commercial product like Splunk (if you can afford it).

That said, here's the Python:

I tried to make this as self-documenting as possible. You run it from the CLI with a syslog file as the argument, and you get this:

$ python sample.txt
 10    LINK-3-UPDOWN
 2     SSH-5-SSH2_CLOSE


     2     LINK-3-UPDOWN

[Stuff deleted for brevity]

For Pythonistas, the script makes use of a few cool language features:

    Named, Compiled rRgexes

    • We can name a regex match with the (?PPATTERN) syntax, which makes it easy to understand it when it's referenced later with the .group('') method on the match object.
    • This is demonstrated in lines 36-39 and 58-59 of the gist shown above. 
    • It would be more efficient to capture these fields by splitting the line with the .split() string method, but I wanted the script to work for unknown field positions -- hence the regex. 

    Multiplication of Strings

    • We control indentation by multiplying the ' ' string (that a single space enclosed in quotes) by an integer value in the print_counter function (line 50).
      • The reason this works is that the Python str class defines a special __mul__ method that controls how the * operator works for objects of that class:
        >>> 'foo'.__mul__(3)
        >>> 'foo' * 3

    collections.Counter Objects

    • Counter objects are a subclass of dictionaries that know how to count things. Jeremy Schulman talked about these in a comment on the previous post. Here, we use Counters to build both the overall message counts and the per-device message counts:
    >>> my_msg = 'timestamp ip_address stuff %MY-4-MESSAGE:other stuff'
    >>> CISCO_MSG = re.compile('%(?P.*?):')
    >>> from collections import Counter
    >>> test_counter = Counter()
    >>> this_msg =,my_msg).group('msg')
    >>> this_msg
    >>> test_counter[this_msg] += 1
    >>> test_counter
    Counter({'MY-4-MESSAGE': 1})

      collections.defaultdict Dictionaries

      • It could get annoying when you're assigning dictionary values inside a loop, because you get errors when the key doesn't exist yet. This is a contrived example, but it illustrates the point:

        >>> reporters = {}
        >>> for reporter in ['','']:
        ...     reporters[reporter].append['foo']
        Traceback (most recent call last):
          File "", line 2, in
        KeyError: ''

      • To fix this, you can catch the exception:

        >>> reporters = {}
        >>> for reporter in ['','']:
        ...     try:
        ...         reporters[reporter].append['foo']
        ...         reporters[reporter].append['bar']
        ...     except KeyError:
        ...         reporters[reporter] = ['foo']
        ...         reporters[reporter].append('bar')
      • As usual, though, Python has a more elegant way in the collections module: defaultdict
      >>> from collections import defaultdict
      >>> reporters = defaultdict(list)
      >>> for reporter in ['','']:
      ...     reporters[reporter].append('foo')
      ...     reporters[reporter].append('bar')
      >>> reporters
      defaultdict(, {'': ['foo', 'bar'], '': ['foo', 'bar']})
      In the syslog counter script, we use a collections.Counter object as the type for our defaultdict. This allows us to build a per-syslog-reporter dictionary that shows how many times each message appears for each reporter, while only looping through the input once (line 66):

       per_reporter_counts[reporter][msg] += 1

      Here, the dictionary per_reporter_counts has the IPv4 addresses of the syslog reporters as keys, with a Counter object as the value holding the counts for each message type:

      >>> from collections import Counter,defaultdict
      >>> per_reporter_counts = defaultdict(Counter)
      >>> per_reporter_counts['']['SOME-5-MESSAGE'] += 1
      >>> per_reporter_counts
      defaultdict(, {'': Counter({'SOME-5-MESSAGE': 1})})
      >>> per_reporter_counts['']['SOME-5-MESSAGE'] += 5
      >>> per_reporter_counts
      defaultdict(, {'': Counter({'SOME-5-MESSAGE': 6})})

      If you got this far, you can go implement it for IPv6 addresses. :-)

      Published: July 02 2014

      • category:
      • tags: