Puppet Infrastructure 2015

Preface

My previous article, Puppet Infrastructure was written back in 2013. It's one of the most popular articles on my blog and seems to have helped a good number of people (as well as draw criticism from others).

With it being 2015 and all, I thought I'd write an updated version and include some notes and thoughts from using Puppet throughout 2014.

Preface
Introduction
Prep-work
Install and configure PuppetDB
Setting Up Version Control
- What About Environments?
Building the Site Module
Committing Everything
Conclusion
Resources

Introduction

This tutorial will explain how to create a new Puppet environment using best practices such as version control, a site-local module, and roles & profiles. It will also include some practices that aren't considered "best", but have helped my sanity while using Puppet in production on a daily basis. It assumes the reader will be using Ubuntu. The instructions were verified with Ubuntu 14.04.

Note that this is not an intro to Puppet – this tutorial assumes that you have at least a beginner's knowledge of Puppet.

Prep-work

The Puppet Labs Package Repo

Install the Puppet Labs apt repo:

wget http://apt.puppetlabs.com/puppetlabs-release-trusty.deb
dpkg -i *.deb
rm *.deb
apt-get update

Install and Configure Puppet Server

This tutorial will use Puppet Server – the next generation of Puppet Master. You can read more about it here and here.

To install it, do:

apt-get install -y puppetserver

I haven't used Puppet Server extensively yet, but it's worked fairly well so far. I plan to use it throughout 2015 and if I run into any major issues, I'll update this tutorial.

Configure Puppet Server

By default, Puppet Server requires 2gb of RAM. If your server does not have 2gb, you can lower the amount by editing /etc/default/puppetserver:

sed -i -e 's/JAVA_ARGS="-Xms2g -Xmx2g -XX:MaxPermSize=256m"/JAVA_ARGS="-Xms1g -Xmx1g -XX:MaxPermSize=256m"/' /etc/default/puppetserver

Additionally, I configure my Puppet servers with the following two settings:

The Future Parser

puppet config set --section main parser future

I've been using the future parser since it was first announced in Puppet 3.2. In my opinion, iteration was a sorely missing feature in Puppet and I have abused the hell out of it since it became available.

However cool the future parser may be, keep in mind that the features it provides are not official. Future releases could remove them, or worse, break them and thus break your environment. This actually happened to me with the Puppet 3.5 release and I spent the afternoon recovering from an almost catastrophic puppet run.

Manifest Ordering

puppet config set --section main ordering manifest

I've included a list of great resources at the end of this article. One of them covers the "manifest ordering" feature in detail. It's a great read and I agree with its principles on why you shouldn't use it.

When I first started using Puppet, I hated dependency-based ordering, but since there was no other option, I had to live with it and eventually it became second nature.

However, if I could get back the ridiculous amount of hours I've spent trying to resolve ordering issues, I'd have an awesome vacation. Dependency-based ordering is great in theory, but unless you write all of your Puppet manifests and modules yourself and are very strict to follow dependency conventions, it's very difficult to get right in practice. Because of this, I've begun using manifest ordering.

I understand that by using manifest ordering, I'm not helping solve the larger issue of fixing dependency problems when I find them in others' modules. I wish I had more time to do that, but unfortunately I'm not paid to work on Puppet all day – sometimes I need to get back to work.

And to play devil's advocate: really, where else am I going to have to use a dependency-based system? Everything else I use executes in a top-down format. And has for loops.

Generate an SSL Certificate

Since the Puppet Server has just been installed, it hasn't generated an SSL cert yet. It will do this the first time you start the service, but the cert will be needed in the following step. To generate the cert, do:

puppet cert generate $(puppet config print certname)

Install and configure PuppetDB

PuppetDB is a complementary service to Puppet. It's not required, but it's very useful. Unfortunately, this tutorial will only cover the installation of PuppetDB and not what makes it so useful.

To install and configure it:

cd /etc/puppet/modules
puppet module install puppetlabs/puppetdb
apt-get -y install puppetdb
cd /root
echo include puppetdb > pdb.pp
echo include puppetdb::master::config >> pdb.pp
puppet apply --verbose pdb.pp
rm pdb.pp
rm -rf /etc/puppet/modules/*

Note that the certificate generated in the previous step must exist for PuppetDB to install cleanly.

Setting Up Version Control

It's a best practice to keep all the configuration management information in a version control provider such as git. Since the repository will probably contain sensitive information about your environment, it's recommended to use an internal git server. Managing one is very easy by using gitolite.

Seriously. Do not use Github, or any other public git host, if your repository will contain sensitive information. And if you accidentally check-in sensitive information, delete the entire repository immediately. Don't just delete the sensitive information and commit the changes. It sounds like common sense, I know, but it happens. Also consider using some type of encryption on your repo like hiera-gpg or blackbox.

Some people keep the entire /etc/puppet directory in the repository. There's nothing wrong with this and if this is what you'd like to do, the following should work:

cd /etc/puppet
git init
git add auth.conf fileserver.conf puppet.conf manifests/ modules/
git commit -m "Initial commit"
git remote add origin <remote repo location>
git push -u origin master

Another way of keeping all Puppet configuration in a repository is to create a module that will only be used for the particular Puppet environment being created. This method will be used for this tutorial.

Begin by creating a module:

cd /etc/puppet/modules
mkdir site
cd site
mkdir manifests ext data
touch data/.keep
touch manifests/.keep
cd ext
touch site.pp
ln -s /etc/puppet/modules/site/ext/site.pp /etc/puppet/manifests/site.pp
cd ..
git init
git add .
git commit -m "Initial commit"
git remote add origin <remote repo location>

Three directories were created inside the site module:

manifests: where your site-local manifests will go
data: where your Hiera data will go
ext: where your "extra" files will go. These files are complementary or supplementary to your environment as well as the main site.pp file.

What About Environments?

I experimented with environments throughout 2013 and 2014 and ultimately decided to not use them. By not using environments, my production sites now have multiple Puppet servers deployed: one for each project or domain of responsibility. If I used environments effectively, all of those Puppet servers could be combined into a single server.

However, I found that using environments in production caused some issues:

If the location that housed the main Puppet server was down (lost power, etc), no nodes anywhere could talk to Puppet.
If the filesystem on the central Puppet server became corrupt, it could affect all nodes. This happened to me in 2014, but the damage was restricted to only one project.
A single main Puppet server would require all projects to work on the same version of Puppet. This is not possible for some projects.
Similarly, upgrading Puppet means upgrading across the entire "federation".
Having to type extra characters and tabs to reach the environment got tedious (cd /etc/puppet/env<tab>/project-name<tab>/mod<tab>/site). Though this was not a significant reason, it was a sense of relief to just go back to /etc/puppet/mod<tab>/site.

With regard to development and testing deployments, it's way too easy for me to use Vagrant to fire up a small Puppet server than to configure a central Puppet server with environments.

This isn't to say that environments are a useless feature. I've just found that they haven't worked well for my specific use-cases.

Building the Site Module

At this point, Puppet is installed and an empty site module exists. Now we'll begin using Puppet to configure the Puppet server itself as well as any other server you place under Puppet control.

NTP

Things begin to break when two servers with skewed times try communicating with each other. To ensure this doesn't happen, NTP will be installed and configured. First, install the puppetlabs/ntp Puppet module:

cd /etc/puppet/modules
puppet module install puppetlabs/ntp

A Note About the Module Subcommand

The puppet command has a built-in subcommand to install modules. It's able to find the module by looking it up at the Forge.

There are pros and cons to this. On one hand, it provides an easy way to install a module plus any other modules it depends on. On the other hand, if you install a module that has a conflicting dependency with another module, the command will break. Additionally, sometimes the version of the module hosted at the Forge is outdated. When this happens, you need to manually download the module from its designated home – usually github.

I usually clone the modules directly from Github. The puppet module command was included in this tutorial as an example.

Keeping Track of Modules

The previous version of this article showed how to create a bash script that will re-download all modules you use. There's nothing wrong with that method and it works great.

The previous version also mentioned librarian-puppet. I tried to use it, but found its dependency resolution and module metadata checks to be way too strict.

Then I started using r10k which you can read about here. It's another great tool, but it didn't make a lot of sense to keep using it once I stopped using environments.

Dan Bode has a tool called librarian-puppet-simple that is a stripped down version of librarian-puppet. It simply installs a set of modules that you list in a Puppetfile – no dependency or metadata checks. Like Dan, librarian-puppet-simple is really awesome, and you should check it out. This is what I've been using for quite a while now and don't see a reason to stop.

Configuring NTP

Now that the puppetlabs/ntp module is installed, it can be used to install and configure NTP on any server under Puppet control. The puppetlabs/ntp module is a simple module and rarely needs any parameters.

In the site.pp file, add the following:

node 'puppet.example.com' {
  class { '::ntp': }
}

Roles and Profiles

There's an issue with this, though. This class will need applied to every server:

node 'puppet.example.com' {
  class { '::ntp': }
}

node 'www.example.com' {
  class { '::ntp': }
}

node 'db.example.com' {
  class { '::ntp': }
}

There's a lot of repeated configuration here and it'll only get worse as more modules are added. A better way to apply modules to nodes is to use the "Roles and Profiles" pattern. The end of this article has some links that will describe this pattern in detail.

A good profile to start with is the "base" profile. This profile will be applied to all servers, so it's important that this profile contains very generic and global settings. To start, create a new manifest called /etc/puppet/modules/site/manifests/profiles/base.pp with the following contents:

class site::profiles::base {
  class { '::ntp': }
}

Next, create a role:

class site::roles::puppet_server {
  contain site::profiles::base
}

Finally, apply the role to the node:

node 'puppet.example.com' {
  contain site::roles::puppet_server
}

With only the ntp module being used in site::profiles::base, this actually seems more complicated. To better show the usefulness of profiles, add the following to the base profile:

$packages = ['git', 'vim']
package { $packages: ensure => latest }

Before, you would have had to add those two lines to each node. Now you just add it to one profile and it gets applied to every node that has the base profile applied to.

All Your Base

Since my "base" settings are so common across different Puppet-controlled environments, I have started creating an actual "base" module called bass. I haven't yet decided to pronounce it bass as in "base" or bass as in the fish.

Class or Include or Contain?

There's a lot of great documentation on this subject that is written better than I ever could. See the end of this article for links. Once you've read everything and understand the history between these three keywords, here's my $0.02:

I use contain in my roles and nodes. This is because I enforce a no-parameter policy in roles and nodes.
I use class in my profiles since those do use parameters. I still sometimes use Anchors and explicit ordering, but I'm finding that these are not needed (as much) in Puppet 3.7+ and Puppet Server and by using manifest ordering.
Sometimes I'll throw a contain or include in the profiles, though, if I'm positive that I'll never need to add parameters to them and that their ordering is stable.

The First Puppet Run

At this point, Puppet can be run for the first time. If all goes well, NTP will be installed and running when Puppet has finished:

puppet apply --verbose /etc/puppet/manifests/site.pp

Configuring the Firewall

Next, the puppetlabs/firewall module will be used to build the basis of a deny-by-default firewall.

Install the puppetlabs/firewall module by cloning it from github:

cd /etc/puppet/modules
git clone https://github.com/puppetlabs/puppetlabs-firewall firewall

Add it to your Puppetfile, too:

mod 'firewall',
  :git => 'https://github.com/puppetlabs/puppetlabs-firewall'

Next, a new manifest called /etc/puppet/modules/site/manifests/firewall.pp will be created:

class site::firewall {
  firewall { '000 accept all icmp':
    proto  => 'icmp',
    action => 'accept',
  }

  firewall { '001 accept all to lo interface':
    proto   => 'all',
    iniface => 'lo',
    action  => 'accept',
  }

  firewall { '002 accept related established rules':
    proto   => 'all',
    ctstate => ['RELATED', 'ESTABLISHED'],
    action  => 'accept',
  }

  firewall { '999 drop all':
    proto  => 'all',
    action => 'drop',
  }
}

Next, add the following to the site::profiles::base profile, before the base packages are applied:

class { '::firewall': }
class { '::site::firewall': }

Now all servers will have a deny-by-default firewall applied. I wouldn't recommend applying this configuration yet because you'll be locked out if you're working on this server remotely.

Hiera and the Firewall

This is a good place to introduce Hiera - a tool to store structured configuration data outside of Puppet manifests. Hiera is installed by default with the Puppet package, so installing a hiera package is not needed. However, to utilize Hiera's merging feature, the deep_merge gem needs to be installed:

gem install deep_merge

To configure Hiera, create /etc/puppet/modules/site/ext/hiera.yaml with the following contents:

---
:backends:
  - yaml

:hierarchy:
  - "common"

:yaml:
  :datadir: /etc/puppet/modules/site/data

:merge_behavior: deeper

Next, link this configuration file to two locations:

ln -s /etc/puppet/modules/site/ext/hiera.yaml /etc/
ln -s /etc/puppet/modules/site/ext/hiera.yaml /etc/puppet

The first location is so you can use the hiera command-line tool. The second location is for Puppet itself.

At the moment, the only hierarchy configured is common. This means that Hiera will only read data from a single file: /etc/puppet/modules/site/data/common.yaml.

Add the following to that file:

---
trusted_networks:
  - '192.168.1.0/24'
  - '10.255.0.0/24'

Add any other networks or hosts (/32) to this list that you need.

Next, add the following before the 999 rule in site::firewall:

$trusted_networks = hiera_array('trusted_networks')
$trusted_networks.each |$network| {
  firewall { "003 allow all traffic from ${network}":
    proto  => 'all',
    source => $network,
    action => 'accept',
  }
}

This block of Puppet code uses Puppet's iteration feature from the future parser, so you'll need to make sure you have it enabled in puppet.conf.

Now the next time you apply the Puppet configuration, a deny-by-default firewall will be enabled with explicit allow rules for each trusted network you specified in Hiera.

My Hiera Structure

Over the past year+, I have standardized on the following Hiera hierarchy:

:hierarchy:
  - "node/%{::hostname}"
  - "location/%{::location}"
  - "role/%{::role}"
  - common

"node" is the hostname (or fqdn) of the node. While the idea of having node-specific settings goes against the "Pets v Cattle" argument, it's sometimes unavoidable. In my production environments, only a small percentage of nodes have their own node-specific settings, and even then it's maybe one or two values.
"location" is an arbitrary fact to group nodes:

echo location=honolulu > /etc/facter/facts.d/location.txt
echo location=maui > /etc/facter/facts.d/location.txt
echo location=dc1 > /etc/facter/facts.d/location.txt

"role" is another fact that matches the role that the node will have applied:

echo role=puppet_server > /etc/facter/facts.d/role.txt

Be careful about using Facter to categorize nodes on the node itself! Let's say a node with a role of dns was compromised and the intruder understood that they could change the role to mysql by replacing the fact in /etc/facter/facts.d/role.txt. On the next Puppet run, MySQL will be installed, which would apply sensitive information to the node that the intruder now has access to. It might also break your DNS server.

What Goes in Hiera and What Uses Hiera?

My personal rules of Hiera data are:

Hiera data is used only in profiles. If I write my own module that will be located in /etc/puppet/modules, I still use class parameters.
Since I only use Hiera in profiles, that means all Hiera data is site-specific. So the deciding question becomes: "what information needs stripped from this module that will allow it to work in another environment?" That data is then moved to Hiera.

Finishing up the Puppet Server Role

Up until now, broad configurations that could be applied to any node have been used. Now we'll create a more specific role and profile to configure the Puppet Server.

In order to do this, several modules will be needed:

mod 'concat',
  :git => 'https://github.com/puppetlabs/puppetlabs-concat',
  :ref => '1.1.2'

mod 'inifile',
  :git => 'https://github.com/puppetlabs/puppetlabs-inifile',
  :ref => '792d35cdb48fc2cba08ab578c1b7bc42ef3a0ace'

mod 'puppet',
  :git => 'https://github.com/jtopjian/puppet-puppet',
  :ref => 'puppetserver'

mod 'puppetdb',
  :git => 'https://github.com/puppetlabs/puppetlabs-puppetdb',
  :ref => '4.1.0'

mod 'postgresql',
  :git => 'https://github.com/puppetlabs/puppetlabs-postgresql',
  :ref => '4.1.0'

mod 'stdlib',
  :git => 'https://github.com/puppetlabs/puppetlabs-stdlib',
  :ref => '4.4.x'

Ref?

In production, I mark all modules in my Puppetfile with a reference. This reference is the known working release or commit of that module. This means that if I ever want to test an updated module, I'll need to actually create a test environment, rebuild everything, and confirm it works. But the alternative of just going cowboy and deploying all new releases to the production environment will only cause a lot of pain (and downtime).

Creating the Puppet Server Profile

Create /etc/puppet/modules/site/manifests/profiles/puppet/server.pp with the following contents:

class site::profiles::puppet::server {

  # Hiera
  $main_settings           = hiera('site::puppet::settings::main')
  $agent_settings          = hiera('site::puppet::settings::agent')
  $master_settings         = hiera('site::puppet::settings::master')
  $server_default_settings = hiera('site::puppet::settings::server_default')
  $puppet_package_ensure   = hiera('site::puppet::puppet_package_ensure')
  $server_package_ensure   = hiera('site::puppet::server_package_ensure')

  # Resources
  class { '::puppet':
    server                  => true,
    main_settings           => $main_settings,
    agent_settings          => $agent_settings,
    master_settings         => $master_settings,
    server_default_settings => $server_default_settings,
    puppet_package_ensure   => $server_package_ensure,
  }

  class { 'puppetdb': }
  class { 'puppetdb::master::config': }

}

This example profile is much more fleshed out than the previous site::profiles::base profile to show the format I'm currently using in my production profiles.

In the common.yaml Hiera file, add this:

# Puppet
site::puppet::puppet_package_ensure: 'latest'
site::puppet::server_package_ensure: 'latest'
site::puppet::settings::main:
  server: 'puppet'
  parser: 'future'
  ordering: 'manifest'
  pluginsync: true
  logdir: '/var/log/puppet'
  vardir: '/var/lib/puppet'
  ssldir: '/var/lib/puppet/ssl'
  rundir: '/var/run/puppet'
site::puppet::settings::agent:
  certname: "%{::fqdn}"
  show_diff: true
  splay: false
  configtimeout: 360
  usecacheonfailure: true
  report: true
  environment: "%{::environment}"
site::puppet::settings::server_default:
  JAVA_ARGS: '-Xms1g -Xmx1g -XX:MaxPermSize=256m'
site::puppet::settings::master:
  ca: true
  ssldir: '/var/lib/puppet/ssl'
puppetdb::master::config::restart_puppet: false

You can see how each Hiera item matches the corresponding section in the profile except for puppetdb::master::config::restart_puppet. This is because puppetdb::master::config::restart_puppet is an Automatic Paramter Lookup. Declaring this in Hiera is the same as if you did:

class { 'puppetdb::master::config':
  restart_puppet => false,
}

Now create the /etc/puppet/modules/site/manifests/profiles/puppet/agent.pp profile:

class site::profiles::puppet::agent {

  # Hiera
  $main_settings         = hiera('site::puppet::settings::main')
  $agent_settings        = hiera('site::puppet::settings::agent')
  $puppet_package_ensure = hiera('site::puppet::puppet_package_ensure')

  # Resources
  class { '::puppet':
    main_settings         => $main_settings,
    agent_settings        => $agent_settings,
    puppet_package_ensure => $server_package_ensure,
  }

}

Now build a role for the Puppet server:

class site::roles::puppet_server {
  contain site::profiles::base
  contain site::profiles::puppet::server
}

With all of this in place, run Puppet:

puppet apply --verbose /etc/puppet/manifests/site.pp

When everything has finished, you should now be able to switch to using puppet agent instead of puppet apply:

puppet agent -t --noop

My Opinion on Automatic Parameter Lookups

I think they're a great idea, but ulimately they're too "magical" and unintuitive. There's no easy way to tell if they're being used by reading the Puppet manifests – you have to read both the manifests and Hiera data and correlate the two data sources.

If there are any automatic lookups in my Hiera data, it's because I got lazy. It happens.

Committing Everything

A lot of work has been done here. To see all of the changes that were made, do the following:

cd /etc/puppet/modules/site
git status

All of this should be committed into git:

git add .
git commit -m "Created base profile, puppet server profile and role, configured hiera."
git push -u origin master

Conclusion

This tutorial was an updated version of my 2013 Puppet Infrastructure tutorial. It described how to install and configure Puppet Server, PuppetDB, and Hiera as well as how to lay the foundation of a maintainable Puppet environment.

In addition, I've included notes of my experiences learned over the past year with Puppet.

Resources

And as mentioned throughout this article:

Gary Larizza's blog. Read everything here. Twice.
Designing Puppet - Roles and Profiles
Puppet Roles and Profiles with a Simple Module Structure (part 2)
Roles and Profiles slidedeck
Puppet Containment
Puppet Automatic Parameters Lookup

terrarum