Python Development

Posted by Joe Topjian on July 19, 2010 under Development | Read the First Comment

Introduction

I first learned Python back in 2004-2005. I thought it was a great language and I loved programming and scripting with it. Then in 2007 I took on a job as a Windows systems administrator. The environment had no use for Python so I stopped using it.

A month ago I decided to pick it up again and began researching what the latest trends and best practices were. This article is a rough summary of that research. I will detail and explain my findings while walking through the development process of a small log parsing utility.

Table of Contents

Exim Log Analyzer

The utility that I developed is a log analyzer for Exim. I already use one that I have heavily modified for cPanel-specific statistics. I thought it would be a good exercise to translate that modified version into a Python utility.

How it Works

The utility would be a series of Python modules glued together by a front module able to be run on the command line with two switches:

  • --logfile=/path/to/logfile
  • --config=/path/to/config

It would be executed like so:

    eximloganalyzer --logfile=/var/log/exim_mainlog --config=/etc/eximloganalyzer.cfg

or if the default file locations are used, simply:

$ eximloganalyzer

The config file will contain a list of rules to look for in the Exim log. These rules consist of three parts:

  • Name of the rule (ex: “Rate Limited Hosts”)
  • A Regular Expression to find (ex: '.*\[(.+)\] .*Host is ratelimited')
  • A return value (ex: '\\1')

The return value might be a little complicated to understand, but can be really useful. In the above example, the regular expression contains the grouping '(.+)' which will return the contents in-between the [ ]. In this case, an IP address. The return value is \\1 which will return the first grouping — the IP address. Rather than just return the IP address, I could return a full string such as 'The IP address is: \\1'. If there were multiple groups in the regular expression, I could return ‘The IP address is: \\1 which is host \\2'.

Requirement Specifications

Before I started writing the utility, I created a semi-detailed outline. I used this outline to write my tests and translate those tests into code:

  • Rules – kept in a config file
    • Either in /etc of specified
    • INI format
    • Line format: “Rule name: (regex, return)”
  • Parser starts
    • Looks for log file
      • Either in /var/log/exim_mainlog or specified
    • Looks for config (see above)
    • Reads in rules
      • Compiles regex
    • Loops through log file
      • Looks for matches
      • Adds 1 to report for return result
    • Prints report results in txt format

With these specifications in place, I decided I would make 4 different modules:

  • Config
  • Parser
  • Report
  • CLI

Now it’s time to set up the development environment.

Development Environment

Eclipse and PyDev

Eclipse and PyDev are consistently cited as a great Python development environment. I decided to try it out. Eclipse can be downloaded from here. Once installed, go to (on a Mac) Help, Install New Software and add the url http://pydev.org/updates. Proceed to install PyDev.

Once installed, go into Eclipse Preferences, PyDev, Interpreter - Python and add a new Interpreter. On OS X, I used /usr/bin/python. Most instructions will mention to add the system-wide site-packages folder. I did not and will explain later why I didn’t.

I also decided to install the Mercurial plugin for Eclipse as that is the RCS I use. To install, again, go to Help, Install New Software and add the url http://bitbucket.org/mercurialeclipse/update-site/raw/tip.

System-wide Python Packages

The system-wide packages that I installed were pip, virtualenv, nose, and coverage:

$ sudo easy_install pip virtualenv nose coverage

Virtual Environment

Once virtualenv was installed, I created a new virtual environment to develop in:

$ virtualenv --no-site-packages /Users/jtopjian/code/python/eximloganalyzer
$ cd /Users/jtopjian/code/python/eximloganalyzer
$ source bin/activate

Next, I installed Paste to help with creating a skeleton structure:

$ pip install PasteScript

Next, I created the area in which the actual code will be written:

$ mkdir src; cd src
$ paster create eximloganalyzer
$ cd eximloganalyzer
$ mkdir tests
$ python setup.py develop

For the paster script, I gave some very generic information about the package. Nothing really fancy at all. All information given can later be changed in the setup.py file.

And finally, I created a .hgignore file containing

syntax: glob
*.pyc
*~
.DS_Store

That takes care of the directory structure. Back to Eclipse to being a new PyDev Project

Creating a PyDev Project in Eclipse

In Eclipse, go to File, New, PyDev Project.

For project name, I used eximloganalyzer. I unchecked Use default and gave a path of /Users/jtopjian/code/python/eximloganalyzer/src/eximloganalyzer. I unchecked Create a default 'src' folder....

When the project was created, I right-clicked on the top-level eximloganalyzer folder and went to Properties. In PyDev - PYTHONPATH, I added the second-level eximloganalyzer folder as a source folder.

Then for External Libraries, I added:

  • /Users/jtopjian/code/python/eximloganalyzer/lib/python2.6
  • /Users/jtopjian/code/python/eximloganalyzer/lib/python2.6/site-packages.

These folders were created by virtualenv. Remember when I did not add site-packages to the PYTHONPATH in Eclipse previously? Now this ensures that only packages installed by virtualenv are seen in the PyDev project — no external package can interfere.

Next, I initialized a Mercurial repository by right-clicking on the top-level eximloganalyzer folder, going to Team and Share Project.

Finally, and this is really cool, I enabled nose to run upon each time a file in this project is saved. I did this by right-clicking on the top-level eximloganalyzer folder, going to Properties and creating a new Builder:

  • Choose Program
  • Name: Nose
  • Location: /usr/local/bin/nosetests
  • Working Directory: ${project_loc}
  • Arguments: --with-coverage -a !slow,!gui
  • Build Options: Launch in Background, After a “Clean”, During manual builds, and During auto builds.

Once all of that is done, I created an initial Mercurial commit:

$ hg addremove
$ hg commit -m “Initial Commit”

Now, after all of that, the environment is created and coding can begin. Damn.

Beginning the Application

The Config Module

I decided that the first place to start was with the Config module. This module would handle reading the config file and returning the Rules.

To begin, I created a new file in the tests directory called tests_configfile.py. The first few tests I fleshed out looked like:

import unittest
from eximloganalyzer.config import Config
 
class ConfigFileTests(unittest.TestCase):
    """Tests the general functionality of the config file"""
 
    def setUp(self):
        """Some initialization stuff"""
        self.configFile = 'tests/eximloganalyzer.cfg'
 
    def testOpenConfigPass(self):
        """Tests if Config object can open the config file"""
        self.failUnless(Config(self.configFile))
 
    def testOpenConfigFail(self):
        """Tests if Config file raises if it can't open the config file"""
        self.assertRaises(IOError, Config, 'Idontexist.cfg')

Upon saving, the Eclipse Console will print out some errors from Nose. This is fine since there is no corresponding Config module to test. This can be easily fixed. Under the second-level eximloganalyzer folder, I created config.py

import ConfigParser, os
 
class Config:
    def __init__(self, configFile):
        self.config = ConfigParser.RawConfigParser()
        if os.path.exists(configFile):
            self.config.read(configFile)
        else:
            raise IOError('Unable to open config file')

And finally, a test configuration file called tests/eximloganalyzer.cfg:

[rules]
Outgoing via Cron: ('.* <= (.+?) U=(.+?) P=local S=(.+?) T="Cron', '\\2')

When saving a file, Nose is triggered, and reports all tests passing. Next, test the ability to parse the [rules] section and return the rules:

    def testParseRulesPass(self):
        """Tests if Config can parse rules from ini file"""
        c = Config(self.configFile)
        rules = c.parseRules()
        self.assertEqual('Outgoing via Cron', rules[0][0])

Nose will give an error. Now add the implementation code:

    def parseRules(self):
        return self.config.items('rules')

And Nose reports... failure still?

AssertionError: 'Outgoing via Cron' != 'outgoing via cron'

That's odd, the rules are coming back in lower-case. Reference to the ConfigParser documentation reveals I need to tell ConfigParser not to do that. In the Config __init__ method, I make the changes:

        self.config = ConfigParser.RawConfigParser()
        self.config.optionxform = str

And all tests pass again. This test-driven development thing isn't so bad after all.

It looks like my requirements for the Config module have been met. As simple as it is, I'll leave it like this for now and move on to the Parser module.

Parser Module

Just like with Config, I'll begin by creating a tests/tests_parser.py file:

import unittest
from eximloganalyzer.parser import Parser
 
class ParserTestsBasic(unittest.TestCase):
    """Tests basic init of the Parser"""
 
    def setUp(self):
        self.configFile = 'tests/eximloganalyzer.cfg'
        self.logFile = 'tests/exim_mainlog.txt'
 
    def testParserLogOpenPass(self):
        """Tests if Parser can open the log file"""
        self.failUnless(Parser(log=self.logFile, config=self.configFile))

As usual, Nose will complain. Next, the actual Parser module:

import os
from eximloganalyzer.config import Config
 
class Parser:
    def __init__(self, log='/var/log/exim_mainlog', config='/etc/eximlogparser.cfg'):
        if os.path.exists(log):
            self.log = log
        else:
            raise IOError('Unable to open log')
 
        if os.path.exists(config):
            self.config = Config(config)
        else:
            raise IOError('Unable to open config')

Great, all tests pass. However, Nose is saying that not all code has been covered:

Name                     Stmts   Exec  Cover   Missing
eximloganalyzer.parser      10      9    90%   9, 14

Lines 9 and 14 reveal IOError Exceptions. I'll add some tests for them:

    def testParserLogOpenFail(self):
        """Tests if Parser can fail when log is not found"""
        self.assertRaises(IOError, Parser, 'Idontexist.txt', self.configFile)
 
    def testParserConfigOpenFail(self):
        """Tests if Parser can faile when config is not found"""
        self.assertRaises(IOError, Parser, self.logFile, 'Idontexist.txt')

Nose is now happy:

Name                     Stmts   Exec  Cover   Missing
eximloganalyzer.parser      10     10   100%   

Moving on...

From this point on, it was more of the same -- write a test, write the implementation, make sure the tests pass, wash, rinse, repeat. I did end up re-writing some tests, but the nice thing was that I was able to find these bugs during the development process. Since nose ran every time I made a change, I was able to tell exactly which changes broke tests and either change the tests accordingly or changed my implementation code.

I found that testing did not work out well with the Report and CLI modules. The Report module printed out a plain-text report with a timestamp. I did not know how to effectively test this timestamp (which included seconds) and became too lazy to look it up (bad, I know). So I ended up just writing the Report module and testing it in the Python interpreter until it worked. In hindsight, I could have tested this with doctests and elipsis'.

For the CLI module, I figured the only way to effectively test it was to use subprocesses and test the output. Since the output consisted of the plain-text report, I just tested it on the command line.

End Result

I've uploaded this utility along with the tests to Pypi. You can download it directly from the page, use easy_install, or use Pip.

Conclusion

I've found this development environment to work wonderful. It took about two weeks worth of research to get it set up correctly, but once everything was in place, it made creating this utility dead simple.

However, I do have a few concerns. Since my background is in Systems Administration and not programming, I find this development environment to be overkill and not very portable. By portable, I mean that I spend a lot of time remotely logged into different servers. It's usually not possible for me to recreate the remote environment on my desktop in order to correctly write utilities. I think I will look into how possible it is to use vi in a similar way as this Eclipse setup.

Regarding Python itself, I very much enjoy working with it. However, I've used Perl since I first learned it back in 1997. It's always been my go-to language when I've needed to write any type of system administration utility. While Python does have its benefits, Perl has the benefit of being more popular in the administration arena (I find Perl more in demand than Python in system administration job postings). My next step is to research all of the above topics (IDE, Test driven development, virtual environments, packaging) with Perl to see how it works out.

Resources

Books

  • Foundations of Agile Python Development: I found the first few chapters of this book to be excellent. Unfortunately a lot of the information is outdated (and it's only 2 years old!) but the good news is that finding the corresponding updated information is not hard.

  • Python Testing: Beginner's Guide: I was only able to make it through the first few chapters of this book. I found the examples to be way too abstract and hard to follow. I think the targeted audience might have been for computer science majors or engineers -- not everyday Python programmers.

  • Perl Development » Terrarum said,

    [...] finishing my Python Development article, I was curious as to how I could create a better Perl development environment for [...]

Add A Comment