Introduction

Cron is an essential *nix package that silently runs in the background running scheduled jobs. The most common version of cron is Vixie Cron. Coincidentally Vixie Cron is also the most vanilla version — it doesn’t provide any flashy features that other variants have, just the ability to read from a crontab and run scheduled jobs.

This, of course, is fine for most servers, but if a server is configured to run several cron jobs every minute, the server could easily become overloaded. I recently ran into this problem and rather than replace the standard cron package with something like fcron I decided to hack Vixie Cron for fun to see what I could do myself.

Table of Contents

What Changes I Wanted to Make

When cron wakes up every minute, it checks to see what jobs need to be run and executes them all at once. So if there are 100 jobs configured to run, 100 jobs will run at once. I wanted to stop this from happening, so the main feature I wanted to add was for cron to be able to queue jobs.

The second modification is related to the first: while I could hard-code a queue length inside the cron source, I would rather have the administrator be able to specify what the length should be. I wanted to add this feature as a command-line switch.

Disclaimer

I am not an expert at C programming. What follows is simply me fooling around for fun. And while I was able to correctly get this to work, I would highly advise against running this in production. Even I did not.

Tools and Setup

I worked on Vixie Cron 4.1 from CentOS 5.3. The build and test machine was a standard install of CentOS 5.3.

I followed the instructions at Owl River about how to create an RPM patching and development environment. As that page explains how to do this much better than I could, I would recommend just reading it there.

One difference, though, is that RHEL provides 70 patches to the base vixie cron source. I did not realize this when I first made my changes, so when I created my final patch, it did not apply correctly over the other 70. Therefore, please patch the source files with all included patches before proceeding. I applied them in the order that they are listed in the vixie-cron.spec file.

Dissecting Cron

Once I had my environment set up, I proceeded to read through the source code and try to find my way around.

For how widespread and important cron is, vixie cron is very simple.

Adding the Command Line Switch

cron.c is the file that contains the main() function. This file also contains the area where cron parses the command line options. Interestingly, the unpatched version of Vixie Cron does not use a getopt feature — it manually parses the command line options itself. One of the RHEL patches added getopt later.

I decided to use q, for “queuing”, as the switch letter. The original line looked like:

while (-1 != (argch = getopt(argc, argv, "npx:m:"))) {

And my edit changed it to:

while (-1 != (argch = getopt(argc, argv, "npx:m:q:"))) {

Below that, I added my case for the switch statement:

        case 'q':
            NumProcs = atoi(optarg);
            NumProcs += 1;
            break;

Of course, with C, all variables have to be pre-declared, so somewhere at the top, I added:

NumProcs = -1;

I later found I also had to add a line to globals.h to make NumProcs global to all files:

XTRN int       NumProcs INIT(0);

With these changes, my new command line switch was in place. If an administrator wanted to make cron only run 2 jobs in the queue at a time, he or she would specify:

crond -q2

Now it’s time to add the queueing feature.

Queuing

job.c is the file which controls the job execution. The function job_runqueue() is where the execution actually happens. It’s here where you can see why the jobs all run at once. There is no timing control in the for loop — the job queue is looped and all jobs are fired off as fast as the CPU can run.

I did some quick reading on UNIX processes and forking and came up with the following modifications to the function.

Here is the original:

job_runqueue(void) {
    job *j, *jn;
    int run = 0;

    for (j = jhead; j; j = jn) {
        do_command(j->e, j->u);
        jn = j->next;
        free(j);
        run++;
    }
    jhead = jtail = NULL;
    return (run);
}

And here is my modified version:

int
job_runqueue(void) {
    job *j, *jn;
    int run = 0;
    int numRunning = 0;
    WAIT_T waiter;
    PID_T pid;

    for (j = jhead; j; j = jn) {
        numRunning += 1;
        if (NumProcs != -1) {
            if (numRunning >= NumProcs) {
                while ((pid = wait(&waiter)) < OK && errno == EINTR)
                    ;
                numRunning -= 1;
            }
        }
        do_command(j->e, j->u);
        jn = j->next;
        free(j);
        run++;
    }
    jhead = jtail = NULL;
    return (run);
}

First I added a variable to keep track of how many jobs are running. I then created variables for the process control: wait and pid.

When the function runs the loop, it checks to see if NumProcs is not set to -1 — or in other words, if no q switch was supplied. If it is not -1, it then applies the control.

The control simply says if the number of jobs currently running are greater than or equal to the number of processes specified with q, run wait() and wait for one or more jobs to finish. When a job does finish, decrement the number of jobs currently running by 1.

First Run

I created a cron table with some test jobs:

* * * * * sleep 10; echo `date` >> /home/jtopjian/t.txt
* * * * * sleep 10; echo `date` >> /home/jtopjian/t.txt
* * * * * sleep 10; echo `date` >> /home/jtopjian/t.txt
* * * * * sleep 75; echo `date` >> /home/jtopjian/t.txt

I ran cron on the command line in Debug mode. Everything worked perfectly. If I set the queue to 1, only one job would run at a time.

Unfortunately, I found a problem. If all jobs took more than 60 seconds to run, cron would skip the next minute’s crons. This was very bad as potentially important jobs would never run.

I re-read the code to find the problem. After an hour of reading and researching, it finally made sense: cron is a single-threaded program. There is no possible way it can keep track of its schedule plus run scheduled jobs. The reason why cron executes all jobs at once is because it needs to return to its sleep schedule as quick as possible so it won’t miss the next run.

In order to solve this problem, I would need to turn cron into a multi-threaded program. My idea was that any time the job queue needed to run, a new thread would be created. The thread would act like its own entity and run the queued jobs without worry of any other part of the program. Though changing the architecture of cron in this way sounded daunting, it actually wasn’t hard at all.

Adding Threading Support

YoLinux provides a wonderful tutorial on Linux Threads. It was all I needed to learn how to add in my threading support.

The first step was to include the threading library into both cron.c and job.c:

#include <pthread.h>

I then had to declare some variables which would control and work with the threads:

pthread_t thread1;
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int rc1;

Next, I modified the way job_runqueue() was called in cron.c. It originally looked simply like:

job_runqueue();

With my modifications:

if ((rc1 = pthread_create(&thread1, NULL, job_runqueue, NULL))) {
    Debug(DSCH, ("[%ld] thread failed\n", (long)getpid()));
}
pthread_join(thread1, NULL);

The next step was to modify the job.c file and add threading support. The final job_runqueue() function looked like this:

int
job_runqueue(void) {
    job *j, *jn;
    int run = 0;
    int numRunning = 0;
    WAIT_T waiter;
    PID_T pid;
    pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;

    Debug(DSCH, ("[%ld] thread number %ld\n", (long)getpid(), pthread_self()));
    pthread_mutex_lock(&mutex1);
    for (j = jhead; j; j = jn) {
        numRunning += 1;
        if (NumProcs != -1) {
            if (numRunning >= NumProcs) {
                while ((pid = wait(&waiter)) < OK && errno == EINTR)
                    ;
                numRunning -= 1;
            }
        }
        do_command(j->e, j->u);
        jn = j->next;
        free(j);
        run++;
    }
    jhead = jtail = NULL;
    pthread_mutex_unlock(&mutex1);
    return (run);
}

The final step was to edit the Makefile so cron compiled with the pthreads library:

LDFLAGS     =   -g -lpthread

Second Run

With all of these changes made, I re-ran my tests. The threading library worked perfectly. Even if a queue of jobs took over a minute to complete, cron would create another thread with a whole new bank of jobs.

Conclusion

By toying around with the cron source code, I was able to learn the detailed workings of vixie cron as well as some basic UNIX C programming.

I’ve uploaded my final patch and my modified vixie-cron.spec file so readers can compile this version of cron themselves.

If anyone has any input, corrections, comments, or ideas on this topic and the changes I made, I would love to hear them.

I realize this article is rather unorganized and missing a lot of details. If anyone needs anything clarified, please let me know.


Tags


Comments

Normalex on October 2nd, 2011:

Joe, I almost started modifying it in the same direction, when accidentally read your article. Very nice toying. Would you like to work more on it together and make a nice complete feature? Here is what I wanted to change: * single execution queue * on each children completion log process’s exec statistics like total time and memory getrusage() * additional parameters, like exclusion list of users from queuing. * we might create a global timer and handle queue in the main thread, this way there is no need to create many inherited threads. The timer will queue jobs and fork those users that should not be queued. * each job completion will trigger pop for the new one from the queue, not loop based.

Kevin O’Mara on May 5th, 2012:

I have just finished with some toying of the Vixie Cron source myself.

I just produced the following patch using the Cygwin patched source of Vixie Cron, but I imagine that my patch will work on other platforms as well (haven’t had the opportunity to test yet). The platform I really needed this functionality on was under Windows 7.

This patch modifies cron so that it will execute a job consistently within a matter of seconds. I have stripped out the other “time bases” (minute, hour, etc.), since it doesn’t make sense to leave them in when considering seconds (trust me). Take a look, for you’ll find it isn’t really a kludge :) .

Here is the page for my cron ‘Seconds’ patch for all those who are interested:

http://www.sandbars.org/cronSecondsPatch/

Hope you find this as useful as I have.