On Configuration Management

Introduction

I'm currently involved in a work exchange where my main focus is to teach OpenStack. Unfortunately, it's not possible to introduce configuration management due to time restrictions. Instead, we’re creating basic shell scripts to install the necessary components onto each server.

As a result of that, I started thinking about how complicated configuration management is and if it's necessary. This article will explain the different pieces involved with configuring a managed server environment and how each piece adds complexity.

Introduction
Components of a Managed Environment
Thoughts
- Glorified Deployment Tool
- Reducing Complexity
Conclusion

Components of a Managed Environment

Server Deployment

In any environment, I feel the most important component is the server deployment component. This component handles creating new servers – it's where everything begins. A good deployment component is flexible enough to handle different types of servers, hardware profiles, and operating systems. If maintained, this component can act as an inventory for your server fleet.

I can split servers into types: bare metal and virtual. I haven't been able to find one server deployment component to handle both types, but I have found one good component for each:

Bare Metal

I've stuck with Cobbler for bare metal servers. I can deploy RedHat and Debian based distributions as well as Windows with Cobbler. Cobbler also acts as my hardware inventory since it can store attributes about each bare metal server. Cobbler might take some time to learn and understand, but the result is worth the investment. Most of the learning curve comes from the use of Cheetah templates when writing custom snippets. I’ve read that Cobbler can use Jinja instead of Cheetah, but Cheetah hasn’t irritated me enough to switch.

Virtual

For virtual, there's Vagrant. Vagrant is an awesome application that can work with many virtual environments. Vagrant is pretty well-known, so I don't think I need to go into more detail about it.

Bootstrapping

Along with deployment, both Cobbler and Vagrant have the ability to bootstrap a server. Cobbler is flexible enough to use almost anything. Vagrant has several built-in provisioners for bootstrapping, including a versatile "shell" provisioner.

I've always used shell scripts with both Cobbler and Vagrant. There are two reasons for this:

Consistency across both tools
Ease and flexibility of shell scripts

To elaborate on the second point: sometimes it's easier to provide a list of initial packages rather than use a tool like Puppet. Another example might be post-deployment disk partitioning and formatting.

Remote Execution

Remote command execution is another common component in managed environments.

Not counting Windows environments, I have two categories of remote execution: SSH-based and RPC-based.

SSH Remote Execution

This is the most simple and common way to execute commands in *nix-based operating systems. It's unusual to find a server that does not have SSH.

There are two forms to SSH remote execution: remote execution against one server or several servers. I'll skip the former since it's simple to understand. The latter can be further split into two more groups: for loops and specialized tools.

SSH for Loops

Running the same SSH command on several servers can get tedious. Introducing a for loop can act as a shortcut:

for i in server-1 server-2 server-3
do
  ssh $i uptime
done

Storing the server names in a text file can enhance the loop:

for i in `cat web-servers.txt`
do
  ssh $i uptime
done

Specialized Tools

Specialized tools are the next step after for loops. The most common tools are dsh and ClusterSSH.

dsh is the tool that I'm most familiar with. I have it installed by default on my central "Command and Control" servers. I keep an ad-hoc list of servers stored in /etc/dsh/groups.

Ansible also has support for executing commands over SSH. Ansible is a much more robust tool than something like dsh, so I don't think it belongs in this category.

Tools such as Fabric, Rake, and Capistrano provide a DSL to help with execution. While a DSL can simplify the job at hand, it's a layer of abstraction that can result in a large time investment.

SSH Key Management

To use an SSH-based execution tool, you need manage your SSH keys. At minimum, you need to deploy your public key to each server's .ssh/authorized_keys file.

You can do this with a configuration management system, but it makes sense to do this at the bootstrapping stage. This way, the server can execute SSH commands as soon as the server is up.

Root or Not

Another consideration is whether to execute the remote commands as root or as a non-root user. Further, should that non-root user have sudo access to root or sudo access to a whitelist of commands?

Full root access is the easiest, yet most dangerous. A non-root user with sudo access to root isn't any safer. A non-root user with sudo access to only a subset of commands is secure, but takes time and effort to manage.

RPC Remote Execution

The next type of remote execution is RPC. An example of this kind of tool is MCollective. SaltStack is another example, but like Ansible, RPC-based execution is not its only job.

RPC remote execution has one big benefit over SSH remote execution: you do not need direct access to the server to execute a command. RPC works by having the executing server place a command on a queue and each other server listen on that queue. In this setup, commands can reach servers that are behind a firewall or NAT device.

Unfortunately, RPC remote execution is one area that I have not invested a lot of time in. MCollective looks like a great tool that has a lot of potential.

Server Attributes

To easily manage a fleet of servers, you need to know various attributes about them. As mentioned earlier, Cobbler can store hardware attributes. But other attributes exist, especially once you have installed the operating system.

For example: what IP address does eth0 have? What kernel is that server using?

Puppet Labs has Facter, which I think is an awesome tool. Chef has Ohai. Ansible and SaltSatck have their own and etc etc.

A tool that provides easy and up to date attributes about a server seems like an abstract concept. But once you've worked with such a tool, it's hard to go back.

The only difficulty that I've had with Facter is storing the attributes in a centralized database. It'd also be nice to be able to share attributes between servers. Puppet Labs has PuppetDB for this purpose, but it's coupled closely to Puppet. You can use Facter outside of Puppet, you need Puppet to import facts into PuppetDB.

Configuration Database

Servers need data to perform a task. Data such as: required users and passwords, application versions, and application options. Having this configuration data easily available is a great asset.

Hiera is a database that provides this functionality. It can also store data in a hierarchy, providing generic settings across all roles and specific settings to individual roles.

Hiera also has the ability to plug into different back-end databases. This means you can use existing databases of information such as a company CRM. Or you can use something as simple as plain text files.

Like Facter, you do not have to use Hiera with Puppet.

Configuration Management System

And now for the big one: configuration management systems. I'm most familiar with Puppet, but a lot of this information applies to all current CMS's.

Domain Specific Language

The first attribute of all modern configuration management systems is the DSL.

Even if the tool does not advertise a DSL, like Puppet, I still consider all current ones to have a DSL of some sort. Ansible uses playbooks to describe how it will apply a configuration to a server. The playbook format is unique to Ansible.

DSLs have advantages and disadvantages. One advantage is the ability to abstract complex logic into something more simple. Puppet, for example, has the package type. The package type will figure out the best way to install a given package on different operating systems on your behalf:

# this will work on both Debian and RedHat
package { 'htop':
  ensure => installed,
}

To do this without the DSL, you would have to do something like this:

if $distro == 'debian' {
  apt-get install -y htop
} else if $distro == 'redhat' {
  yum install -y htop
}

But the DSL doesn't just abstract which command to use for a particular resource. It also figures out the state of the resource. For example: whether a resource is installed, what version is currently installed, and what version should be installed.

Puppet's DSL will compare the current state of a resource against the state that you want the resource to have. Based on that comparison:

An action is not performed if a resource is already at the desired state.
An action is performed if you choose to change the state.
An action is performed if the state changed outside of your control.

A lot of work goes into figuring out state. For example, you can define a user in Puppet like so:

user { 'jtopjian':
  uid        => 500,
  gid        => 500,
  managehome => true,
  password   => 'a1b2c3d4',
}

Creating the same user in Bash is easy:

# groupadd -g 500 jtopjian
# useradd -m -g 500 -u 500 -p a1b2c3d4 jtopjian

But you can only run those commands once. Running them twice will result in errors that the group and user already exist. To get around, you could first check the existence of the user:

grep jtopjian /etc/group
if [ $? -eq 1 ]; then
  grep jtopjian /etc/passwd
  if [ $? -eq 1]; then
    groupadd -g 500 jtopjian
    useradd -m -g 500 -u 500 -p a1b2c3d4 jtopjian
  fi
fi

Still, this is not verifying whether the jtopjian user has a uid of 500, a gid of 500, and has a password hash of a1b2c3d4. The code becomes quite complex by the time you account for these attributes. Even if you invest the time and effort, chances are your code won't work as-is on FreeBSD just as it did on Ubuntu.

State is the best argument for using a DSL-based configuration management system.

The biggest disadvantage, though, is how the DSL can become so complicated that you lose sight of the task at hand.

For example, you could be an Apache and MySQL expert. But to understand the puppet-apache and puppet-mysql modules, you also need to understand Puppet. If you spend enough time working with either, you'll begin to lose sight at how you used to do the same basic tasks on the command line.

You could become reliant on the DSL to perform basic tasks such as creating a database or a virtual host.

Files and Templates

I hate using static files and templates to configure applications. If you don't keep the copies up to date, they become stale and might break the application.

Further, most people don't bother templating all options in a configuration file. If you use a module that is missing a configuration option, you must add the missing option to both the template and the manifest. If you do this enough times, you will see that this pattern is a waste of time. You're simply using Puppet as a conduit to pass an arbitrary value at an arbitrary position in a configuration file. It's a modern day Useless Use of Cat.

The only benefit of this conduit is the ability to validate the options inside the manifest. But most applications already do that, so you're still duplicating effort. It's easier to let the application report the error than templating an entire configuration file, validating the options, and staying up to date with the latest options.

If you need to edit a configuration file, try to do it inline. Puppet provides types such as file_line and ini_setting to assist with this. Augeas provide a way to edit configuration files of any format inline.

Static files and templates have their use outside of configuration files. An example is an OpenStack openrc file. The openrc file is a supplemental file that provides an easy way to authenticate to OpenStack. Whether you have an openrc file or not, OpenStack doesn't care.

Stop templating configuration files. It'll amaze you at how much more flexible and future-proof your module is.

Lines of Separation

Where does the job of Puppet begin and end? Should it start as soon as the operating system is installed? Should it end when a specific application has been installed or should it also configure the application, too?

The former was already described in the Server Provisioning section. I'll focus on the latter.

Let's say you use Puppet to install Trac. You also decide to prep Trac with a few admin users. Once everything is up and running, including the users, you send a notice out that the site is now available for use. The admin users begin using the site and begin adding other users.

Who handles those other users? Should they be "back ported" to Puppet so the Trac configuration matches what is in Puppet? Should you schedule a separate backup just for the Trac database? If that's the case, why even bother managing the Trac users through Puppet in the first place?

Or why not just use Puppet to install the Apache and Trac packages, then use the existing trac-admin tool to add the admin users?

I have no firm opinion on this and unfortunately I spend more time than I'd like contemplating this situation.

Validating the Entire Configuration

Most server configurations start simple and grow large. For example, you deploy a new web server that will host a single web site. Your Puppet manifests for this server are nice and succinct. Then, a month later, you get a request to host a new web site on this server, so you attach a new manifest. Then you get another request. This time, you have to install a couple new libraries.

As time goes on, this web server's configuration has grown from one site to many.

When was the last time you tried deploying a brand new web server from scratch using the present configuration? Are you sure that the order in which Puppet creates all sites will work without error?

To assist with this issue, the concept of Continuous Delivery, Continuous Integration, and all kinds of testing are crossing over from the software development world into the operations world. This isn't a bad thing, but it definitely adds new layers of complexity into an environment.

As a systems administrator, I now need to understand tools like RSpec and Jenkins to make sure that the new configuration management system won't break servers. These same servers would have never had these issues if I continued configuring everything manually or by using simple shell scripts.

Thoughts

Glorified Deployment Tool

The last two sections, Lines of Separation and Validating the Entire Configuration, introduce a lot of complexity into your environment.

If you're unable to create a standard policy about where the line of separation should be or if you're unable to confirm whether your configuration can be safely deployed onto a new server, the CMS is at risk of becoming just another deployment tool: the server has its operating system installed, the configuration management system applies a configuration once, and that's it.

Reducing Complexity

A typical Puppet environment for me includes a Puppet Master, PuppetDB, Hiera, environments, and r10k. There are roles and profiles for each type of service, some of which are simple and others abstract and advanced. Several third-party modules support these roles and profiles. Sometimes only one version of a module is known to work, so you must pin that version must to the environment.

When I began writing shell scripts for deployments a few weeks ago, I was amazed at how complex I let my environment become. The difference between a shell script versus the environment I just described was too large for me to dismiss.

How do you reduce the complexity of a managed environment?

If you stop using a DSL-based system, you gain the advantage of being able to use shell scripts, but the disadvantage of not being able to (easily) keep state.

If you choose not to continuously test your configuration for successful deployments, you're no longer able to trust your CMS.

I don't have an answer, and might not ever have a satisfactory answer, but there are a few areas that I'm currently looking into:

Shell-based Configuration Management System

I would have thought that a tool which sits somewhere between Capistrano and Puppet exists, but I haven't been able to find it yet.

Some notable tools that I have found are:

cdist: a nice tool, but I'm not a fan of the gencode system. cdist compiles a "catalog" by generating the shell script that the remote server will execute. The catalog generation is known as "gencode" and is also written in shell. This creates two layers of shell scripts.
SM Framework: a useful Bash framework, but I don't think it has a way to easily execute remote modules.
rerun: similar to SM.

Less Complex Puppet Modules

I have no idea what "less complex Puppet modules" are, but some pipe dream ideas include:

Smaller than the puppet-apache and puppet-mysql modules (yet provide the same amount of features!).
Few class parameters.
Little-to-no use of templates.

In some ways, I find R.I's ideas (1,2) about Hiera and parameters a six-one-way-half-a-dozen-the-other problem. I also find the current Data in Modules spec overly complex.

Sometimes I think that I should just try another configuration management system, but just a "grass is always greener" thought. For example, Chef has the same issue with regard to Node attributes and Data Bags.

Masterless Puppet

The idea of running a Puppet environment without a Puppetmaster is interesting. This presentation will describe the advantages of doing this.

Image Artefacts

The idea here is that instead of building your servers from scratch, build each type of server once, strip the unique components, and create an image or template from the result. Now when you need to build a server, base it off of that "pre-baked" image.

Here's a great article that provides more details.

This method also gave me a great understanding of the use of tools such as etcd and doozerd.

Conclusion

I assembled this article from notes that I've been keeping for the past few weeks. These notes helped me see the different pieces involved with the configuration portion of a managed server environment, why they exist, and how removing them would affect the environment as a whole.

I welcome comments, suggestions, and fixes.

terrarum