Infrastructure Management with Capistrano

Introduction
Installation
- Initial Usage
Hiera Integration
Modules
Vagrant
- Hiera
- Vagrant Module
Bootstrapping Servers
SSH
Masterless Puppet
Conclusion

Introduction

Since writing my last article, On Configuration Management, I've been researching different tools. I found several great ones on Github, such as:

All these tools had great ideas and I'm glad to have found them. Sprinkle and Babushka were particularly interesting because of their conditional testing after each task. This allows both a form of testing and idempotence checking in one step.

Unfortunately, none of them were exactly what I was looking for, so I began working out what my perfect tool would be. I realized that I was looking for more than a configuration management system. Configuration management (at the level of Puppet et al) is only a small part of infrastructure management. I wanted a tool that would help me corral all parts of infrastructure management into a single framework. I want the ability to apply Puppet manifests on one server, configure a Gluster volume on a second server, and create a Gitolite user on a third.

I defined three core values that I wanted this tool to have:

Push-based
Task-driven
Extendible

Capistrano was a perfect fit for all three of those items. It's also well-known which is a bonus. And being based on Rake was an added benefit.

My Capistrano setup is still a work in progress, but here's what I have so far:

Installation

Capistrano's README file does a great job explaining both how to install Capistrano and the basics of using it. If you aren't familiar with Capistrano, I recommend reading it along with the docs on Capistrano's homepage before continuing on.

Initial Usage

I made a slight change to the Gemfile in that I'm using the trunk version of sshkit. It has better error reporting than the current gem available.

gem 'sshkit', :git => 'https://github.com/capistrano/sshkit'

To begin, add an existing server to config/deploy/staging.rb. There are several examples in the stage template to go off of.

Next, add a file called lib/capistrano/tasks/utils.cap with the following contents:

namespace :utils do
  desc "Check the uptime of servers"
  task :uptime do
    on roles(:all) do |host|
      info "Host #{host} (#{host.roles.to_a.join(', ')}):\t#{capture(:uptime)}"
    end
  end
end

At this point, you should have a file and directory structure similar to this.

You should be able to run the following command and see the uptime of the server (or servers) that you added to the deploy file:

$ cap staging utils:uptime

Note: Capistrano requires that all hostnames are resolvable. You can either create a proper DNS entry or add an entry to /etc/hosts. Also, if you will be churning servers, make sure to clean up old SSH fingerprints or add the following to ~/.ssh/config:

Host *
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null

Hiera Integration

Adding servers to the deploy file is fine, but I thought it would be better if they were stored in Hiera. I'm still a novice with Ruby, and with some help I got it working.

First, add the following to the Gemfile:

gem 'hiera'

and run:

$ bundle install

Next, create the following files:

lib/cap_hiera.rb: This file makes a few Hiera-based functions available in Capistrano.
hiera/hiera.yaml: This file is a basic Hiera configuration. You can see where this file is called in cap_hiera.rb.
hiera/data/staging.yaml: This file will contain server definitions for the "staging" stage.

You can verify that Hiera is working by running the following:

$ hiera -c hiera/hiera.yaml servers stage=staging

Once confirmed working, you can change the Capfile and add the following lines:

require_relative 'lib/cap_hiera'
hiera_build_servers_from_stage ARGV[0]

You can now remove any server entries made in the config/deploy/staging.rb file. Unfortunately you cannot remove the file itself as Capistrano uses the name of the file as a stage definition.

You can also add global settings in Hiera. For example, add the following to hiera/data/default.yaml:

ssh_options:
  :user: 'ubuntu'
  :keys: '/path/to/key'

And add the following to Capfile:

set :ssh_options, hiera('ssh_options')

At this point, you should have a Capistrano directory that looks similar to this.

Modules

So far, this Capistrano installation has a utils.cap task file and some Hiera files. Since I'll be adding more features, I wanted to organize everything into quasi "modules". At the moment, this module structure is not suited for large redistribution. It's more to organize local files.

Hiera

Here's how the Hiera configuration looks I converted it to a "module":

modules
└── hiera
    ├── files
    │   ├── data
    │   │   ├── default.yaml
    │   │   └── staging.yaml
    │   └── hiera.yaml
    └── lib
        └── cap_hiera.rb

cap_hiera.rb, hiera.yaml, and Capfile will all need modified to account for the new paths.

Utils

You can move the utils.cap task into its own module:

modules
└── utils
    └── tasks
        └── utils.cap

Again, Capfile will need modified to import the tasks at the new location:

Dir.glob('modules/*/tasks/*.cap').each { |r| import r }

If you modified all files correctly, then the following should work as it did before:

$ cap staging utils:uptime

At this point, you should have a directory structure similar to this.

Module Caveats

There are a lot of hard-coded paths in the modules. They also contain site-specific data, so public distribution is a bad idea – especially if the module contains sensitive information.

Vagrant

The next feature is Vagrant support. Being able to add existing servers to Hiera is fine, but I want to be able to add new servers to Hiera and have Vagrant create those servers.

I use Vagrant with the vagrant-openstack-plugin so this section will be specific to that. It's easy to swap out this configuration with another cloud plugin and should not be hard to change for basic VirtualBox.

Hiera

In the default.yaml file, I have the following:

# Global Server Defaults
server_defaults:
  :cloud: 'mycloud'
  :provider: 'openstack'
  :private_key: '/path/to/key'
  :flavor: 'n1.small'
  :image_id: '4042220e-4f5e-4398-9054-39fbd75a5dd7'
  :keypair: 'home'
  :user: 'ubuntu'
  :security_groups: ['default', 'openstack']

These attributes match to settings used in the Vagrant OpenStack plugin. The :cloud attribute is an arbitrary name. Hiera merges these default settings with defined servers by the hiera_get_server method.

Vagrant Module

The Vagrant module looks like this:

vagrant
├── files
├── tasks
│   └── vagrant.cap
└── tpl
    └── Vagrantfile.mycloud.erb

Notice the template, which you can view here. The values inside the template are filled in with the corresponding server attributes. Using a template like this makes building Vagrantfiles inflexible and maybe I'll be able to fix that at some point.

The vagrant.cap file contains three tasks for managing Vagrant machines. The vagrant:new task contains a method called render_template. This is a "helper" function which is defined in a new file, modules/utils/lib/helpers.rb. Add helpers.rb to Capfile:

require_relative 'modules/utils/lib/helpers'

There are some methods to colourize output. This requires the colorize gem and you should add it to the Gemfile.

You should now be able to create a new Vagrant virtual machine by calling vagrant:new. If you run this task without a "host filter", Capistrano will create Vagrant virtual machines for every server defined. So to create a single server, do the following:

$ cap staging --hosts example.com vagrant:new

If all was successful, a Vagrantfile under modules/vagrant/files/mycloud/example.com will be available. You can change into that directory and run:

$ vagrant up --provider=openstack

Or use the vagrant:up task instead of vagrant:new.

At this point, your Capistrano directory should look similar to this.

Bootstrapping Servers

Vagrant can provision virtual machines with services such as shell, Puppet, or Chef. But I also want to provision other types of servers. By using Capistrano, I can create a bootstrap task that emulates the Vagrant shell provisioner. Now I can bootstrap bare-metal servers as well as virtual machines created outside of Vagrant.

I decided to place the "bootstrapping" task under a new module called "server":

server
├── files
│   └── bootstraps
│       └── ubuntu.sh
└── tasks
    └── server.cap

The server.cap file looks like this and the ubuntu.sh bootstrap script looks like this.

With this module in place, I can bootstrap any type of server by performing the following task:

$ cap staging --hosts example.com server:bootstrap[ubuntu.sh]

At this point, your Capistrano directory should look something like this.

SSH

Capistrano can now read an inventory of servers from Hiera, provision them with Vagrant, and bootstrap them with shell scripts. This next section introduces some SSH tasks:

Uploading keys
Creating SSH host entries
Running arbitrary commands

The SSH module looks like this:

ssh
├── files
│   └── keys
│       ├── test
│       └── test.pub
└── tasks
    └── ssh.cap

You can see the ssh.cap file here. Note the upload_and_move method. This is a new method added to the helpers.rb library, seen here. This method uploads a file and then uses sudo to move the file to its remote destination.

With this module in place, the following workflow is possible:

# Upload a key
$ cap staging --hosts example.com ssh:add_key[root,test]
# Add a host that requires the uploaded key
$ cap staging --hosts example.com ssh:add_host[foobar.com,/root/.ssh/test]
# Run a simple command on the server
$ cap staging --hosts example.com ssh:cmd['ls -la']

Sharing private SSH keys across hosts isn't the most secure thing to do. But it's more simple than generating a key on a new host and then configuring a service to use the new key. (though such tasks were difficult to do in Puppet, they are now easier with a task-based approach!)

Masterless Puppet

Now for Puppet. I was eager to try out a "masterless" Puppet workflow and read everything I could find on the topic. After actually implementing this style of Puppet, I've found that it's more difficult than it initially sounds.

The difficulty comes from ensuring a server receives only the configuration data it needs. Blanketing all servers with a single Puppet repository that includes all site-specific data puts your entire infrastructure at risk: if a single server is compromised, your entire infrastructure configuration is exposed.

With this thought in mind, using a masterless Puppet workflow might even be more secure than a central Puppet server.

The Puppet Module

The Puppet module looks like this:

puppet
├── files
│   └── staging
│       ├── Puppetfile
│       ├── base.pp
│       └── web.pp
└── tasks
    └── puppet.cap

The idea is to have one Puppetfile per stage. The Puppetfile contains the modules used in that stage. You can also specify specific versions of the modules to use. It's possible that each server will not use all modules, but in my opinion, this is an acceptable waste of space. The alternative is to create one Puppetfile per server or per role.

Next, each role has one Puppet manifest. This maps to the common "roles and profiles" pattern. Capistrano will apply each role manifest separately. All information that the server needs for that particular role must be in the manifest. The inability to share data between manifests can be an issue, but I see it as a way to enforce contained and non-conflicting roles.

Finally, each server has a "base" role for "free". You do not have to specify this role in Hiera. If a base.pp manifest exists for a stage, then each server in that stage will have the "base" role applied.

You can look through the staging files here.

Notice that base.pp applies a site::roles::base role. According to the Puppetfile, the site module is located on an internal server. The site role might contain sensitive information or information that you don't feel like sharing publicly.

The Hiera Module

Hiera plays a big role in my Puppet work and I wanted to continue using it with a masterless Puppet environment. Since Capistrano is already configured to use Hiera, it makes sense to continue using that module. I re-arranged the Hiera module structure to look like this:

hiera
├── files
│   ├── data
│   │   ├── capistrano
│   │   │   ├── default.yaml
│   │   │   └── staging.yaml
│   │   └── staging
│   │       ├── default.yaml
│   │       └── web.yaml
│   └── hiera.yaml
└── lib
    └── cap_hiera.rb

I split the data directory into two directories. A capistrano directory will hold Capistrano-specific data. A staging directory that corresponds to the "staging" stage. There is a default.yaml file inside the staging directory. It contains global settings (such as for the "base" role) and a YAML file for each other role.

The Puppet Module (con't)

The puppet.cap task file is the most complex task file to date. While it contains a lot of logic, it should not be difficult to understand, though.

I based a lot of the puppet.cap off of the existing work done in Supply Drop.

The first few tasks should be self-explanatory. The task that requires some notes is the puppet:deploy task.

The first thing to notice about this task is that it introduces a few new methods:

md5_diff: Compares the MD5 hash of a local and remote file
upload_and_move_if_changed Uploads the local file to the remote server if the MD5 hash is different.

Unfortunately these two tasks can involve a lot of SSH chatter to do the comparison and make the decision. I plan to look into different methods of simplifying this such as storing an MD5 cache locally or building an rsync queue.

Once you understand the new methods, the rest of the task becomes quite simple:

Capistrano generates a hiera.yaml configuration file which includes all server roles.
Capistrano uploads default.yaml, base.pp or <role>.pp if they exist.
Capistrano uploads and runs the stage's Puppetfile.

I use r10k to control the Puppetfile because I found librarian-puppet too strict with regard to certain Puppet modules' Modulefile.

Masterless Puppet Workflow

With all this in place, here is my current workflow for a masterless Puppet setup:

Add a role to a server definition in hiera/files/data/capistrano/stage.yaml.
Create a Puppet manifest titled role.pp in puppet/files/stage.
Deploy the files to a server with:

$ cap staging --hosts example.com puppet:deploy

Preview the Puppet run by doing:

$ cap staging --hosts example.com puppet:noop
# or
$ cap staging --hosts example.com puppet:noop[verbose]

Apply the Puppet manifests to the server:

$ cap staging --hosts example.com puppet:apply

At this point, your Capistrano directory should look similar to this

Masterless Puppet Thoughts

When I started using a masterless Puppet workflow, a few things were immediately clear:

Simplified Puppet with no more central server, PuppetDB, or certificates
Lack of exported resources and sharing information between servers

I'm enjoying the first point but will have to dedicate some time to solving the second point. My initial thoughts are to use facter-dot-d more and perhaps something like Juju.

Conclusion

This concludes my current Capistrano setup. It's still a work-in-progress but I've been able to use it as a daily tool.

There are a few areas where I'd like to improve on:

More efficient file transfer (rsync queue?)
Redistributable modules (.gitignore in the files directory?)

Update: I have refactored the modules described in this article into something more redistributable. See here.

Some tasks can be difficult to kill or cancel. I'm not too sure how to resolve this issue.

I'd love to hear comments, ideas, patches, or criticism.

terrarum