TaskForest
A simple, expressive, open-source, text-file-based Job Scheduler with console, HTTP, and RESTful API interfaces.
NEW - TaskForest now has two mailing lists: for announcements & for discussions.
Documentation
  1. Downloading TaskForest
  2. Installing TaskForest
  3. Configuring TaskForest
    1. Jobs & Families
    2. Calendars
    3. Options
    4. Configuration File
  4. Running TaskForest
  5. Running the TaskForest Web Server
  6. Web Server Security
  7. Checking TaskForest Status
  8. Rerunning a Job
  9. Marking a Job
  10. Tokens
  11. Releasing all Dependencies from a Job
  12. Putting a Job on Hold
  13. Releasing a Hold Off a Job
  14. HOWTO
  15. The RESTful Web Service
  16. Frequently Asked Questions (FAQ)
  17. Bugs
  18. Author
  19. Acknowledgements
  20. Copyright
Jobs

A job is defined as any executable program that resides on the file system. It is represented as a file in the files system whose name is the same as the job name. Jobs can depend on each other. Jobs can also have start times before which a job may not by run.

When a job is run by the run wrapper (bin/run), two status semaphore files are created in the log directory. The first is created when a job starts and has a name of $FamilyName.$JobName.pid. This file contains some attributes of the job. When the job completes, more attributes are written to this file.

When the job completes, another semaphore file is written to the log directory. The name of this file will be $FamilyName.$JobName.0 if the job ran successfully, and $FamilyName.$JobName.1 if the job failed. In either case, the file will contain the exit code of the job (0 in the case of success and non-zero otherwise).

When a job is run by the run_with_log run wrapper, any output the job sends to stdout or stderr will be captured and stored in a file called $FamilyName.$JobName.$pid.$start_time.stdout in the log directory.

Within TaskForest, every job has a status, which is one of the following values:

Jobs & Families

Jobs can be grouped together into ``Families.'' A family has a start time associated with it before which none of its jobs may run. A family also has a either (a) a list of days-of-the-week or (b) a calendar associated with it. Jobs within a family may only run on the days specified by the days-of-the-week or the calendar.

Jobs and families are given simple names. A family is described in a family file whose name is the family name. Each family file is a text file that contains 1 or more job names. The layout of the job names within a family file determine the dependencies between the jobs (if any). There are several reasons why text files are a good choice for Family files.

Family names and job names should contain only the characters shown below:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

Let's see a few examples. In these examples the dashes (-), pipes (|) and line numbers are not parts of the files. They're only there for illustration purposes. The main script expects environment variables or command line options or configuration file settings that specify the locations of the directory that contain family files, the directory that contains job files, and the directory where the logs will be written. The directory that contains family files should contain only family files.

EXAMPLE 1 - Family file named F_ADMIN

   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', days => 'Mon,Wed,Fri'
02 |
03 | J_ROTATE_LOGS()
04 |
   +-------------------------------------------------------
  

The first line in any family file always contains 3 bits of information about the family: the start time, the time zone, and the days on which this jobs in this family are run, or the calendar that specifies on which dates jobs in this family are run.

In this case, this family starts at 2:00 a.m. Chicago time. The time is adjusted for daylight savings time. This family 'runs' on Monday, Wednesday and Friday only. Pay attention to the format: it's important.

Valid days are 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'. Days must be separated by commas.

All start times (for families and jobs) are in 24-hour format. '00:00' is midnight, '12:00' is noon, '13:00' is 1:00 p.m. and '23:59' is one minute before midnight.

There is only one job in this family - J_ROTATE_LOGS. This family will start at 2:00 a.m., at which time J_ROTATE_LOGS will immediately be run. Note the empty parentheses [()]. These are required.

What does it mean to say that J_ROTATE_LOGS will be run? It means that the system will look for a file called J_ROTATE_LOGS in the directory that contains job files. That file should be executable. The system will execute that file (run that job) and keep track of whether it succeeded or failed. The J_ROTATE_LOGS script can be any executable file: a shell script, a perl script, a C program etc.

To run the program, the system actually runs a wrapper script that invokes the job script. The location of the wrapper script is specified on the command line or in an environment variable.

Now, let's look at a slightly more complicated example:

EXAMPLE 2 - Job Dependencies

This family file is named WEB_ADMIN.


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

A few things to point out here:

It is possible to have a dependency on a job that's in another family. If, for example, J_ROTATE_LOGS was in the Family named LOGS, then the family above would look like this:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |            LOGS::J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

An external job dependency is different from 'normal' job dependencies, because unlike 'normal' dependencies, it specifies only the dependency, and not when the external job should run. This means that looking at the above family file, we cannot say when J_ROTATE_LOGS will run. More accurately, we cannot say when LOGS::J_ROTATE_LOGS will run. All we know is that after it runs, J_RESOLVE_DNS and Delete_Old_Logs can run (after 2:00 GMT).

This also means that external job dependencies may only be specified on the first line of a Family, or the first line of a group of jobs (see example 4). Therefore the following is not allowed:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |            LOGS::J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |            REPORTS::J_WEB_REPORTS()  # BAD!
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

To see how this should be written, we need to know about Job Forests. Since that's described in example 4, we'll defer the solution until then.

One last thing about external job dependencies: just because we're waiting on a job in another family, that doesn't mean that the same job cannot be run in this family. For example, the following is permitted:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |            LOGS::J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
11 |                J_ROTATE_LOGS()   # This is a different
12 |                                  # job!
13 |
   +-------------------------------------------------------
  

The family will not start until J_ROTATE_LOGS has run from the LOGS family. The last job run by this family will be J_ROTATE_LOGS. It has nothing to do with the instance of the job that ran in the LOGS family. Line 11 will actually run the job, while line 3 only checks whether the job has run (by another family). That's what I mean when I say that external dependencies only specify the dependencies, while normal dependencies also specify when the job should run.

EXAMPLE 3 - Time Dependencies

Let's say that we don't want J_RESOLVE_DNS to start before 9:00 a.m. because it's very IO-intensive and we want to wait until the relatively quiet time of 9:00 a.m. In that case, we can put a time dependency of the job. This adds a restriction to the job, saying that it may not run before the time specified. We would do this as follows:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS(start => '09:00')  Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

J_ROTATE_LOGS will still start at 2:00, as always. As soon as it succeeds, Delete_Old_Logs is started. If J_ROTATE_LOGS succeeds before 09:00, the system will wait until 09:00 before starting J_RESOLVE_DNS. It is possible that Delete_Old_Logs would have started and complete by then. J_WEB_REPORTS would not have started in that case, because it is dependent on two jobs, and both of them have to run successfully before it can run.

For completeness, you may also specify a timezone for a job's time dependency as follows:

05 | J_RESOLVE_DNS(start=>'10:00', tz=>'America/New_York') ...
EXAMPLE 4 - Job Forests

You can see in the example above that line 03 is the start of a group of dependent jobs. No job on any other line can start unless the job on line 03 succeeds. What if you wanted two or more groups of jobs in the same family that start at the same time (barring any time dependencies) and proceed independently of each other?

To do this you would separate the groups with a line containing one or more dashes (only). Consider the following family:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS(start => '09:00')    Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
11 |-------------------------------------------------------
12 |
13 | J_UPDATE_ACCOUNTS_RECEIVABLE()
14 |
15 | J_ATTEMPT_CREDIT_CARD_PAYMENTS()
16 |
17 |-------------------------------------------------------
18 |
19 | J_SEND_EXPIRING_CARDS_EMAIL()
20 |
   +-------------------------------------------------------

Because of the lines of dashes on lines 11 and 17, the jobs on lines 03, 13 and 19 will all start at 02:00. These jobs are independent of each other. J_ATTEMPT_CREDIT_CARD_PAYMENT will not run if J_UPDATE_ACCOUNTS_RECEIVABLE fails. That failure, however will not prevent J_SEND_EXPIRING_CARDS_EMAIL from running.

Finally, you can specify a job to run repeatedly every 'n' minutes, as follows:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 | J_CHECK_DISK_USAGE(every=>'30', until=>'23:00')
04 |
   +-------------------------------------------------------

This means that J_CHECK_DISK_USAGE will be called every 30 minutes and will not run on or after 23:00. By default, the 'until' time is 23:59. If the job starts at 02:00 and takes 25 minutes to run to completion, the next occurance will still start at 02:30, and not at 02:55. By default, every repeat occurrance will only have one dependency - the time - and will not depend on earlier occurances running successfully or even running at all. If line 03 were:

J_CHECK_DISK_USAGE(every=>'30', until=>'23:00', chained=>1)

...then each repeat job will be dependent on the previous occurance.

Now, let's get back to our discussion of external dependencies from example 3. I said that an external dependency may only be specified on the first line of the file or the first line of a group of jobs. This way of specifying a family is not allowed by TaskForest:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |            LOGS::J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |            REPORTS::J_WEB_REPORTS()  # BAD!
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

With a few minor modifications, the family can be specified correctly:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |            LOGS::J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |--------------------------------------------
08 |
09 | J_RESOLVE_DNS() Delete_Old_Logs() REPORTS::J_WEB_REPORTS()
10 |  
11 |   J_EMAIL_WEB_RPT_DONE()  # send me a notification
12 |
   +-------------------------------------------------------
  

We've moved the external dependency to the first line of it's own section. Now J_EMAIL_WEB_RPT_DONE relies on all 3 jobs, 2 that run in this Family, and one from the REPORTS family.

EXAMPLE 5 - Tokens

A token is a dependency. It is something that a job must 'possess' before it can run, if that job needs that token. You can create different types of tokens, giving each type a common name. You can also specify how many instances of tokens of each type are to exist. For example, if the configuration file contained the following lines:


   +-------------------------------------------------------
01 | ...   
02 | <token T>
03 |   number = 1
04 | </token>
05 | <token U>
06 |   number = 2
07 | </token>
08 | ...
   +-------------------------------------------------------

...it means that there are two types of tokens: 'T' and 'U'. There is only one instance of token type 'T', and two of type 'U'.

Given the above configuration, if your Family file looked as follows:


   +-------------------------------------------------------
01 |start => '00:00', tz => 'GMT', days => 'Mon,Wed,Fri'
02 |
03 | J1( token => 'T')  J2 ( token => 'T' ) J3()
04 |
05 |-------------------------------------------------------
06 |
07 | J6(token => 'U') J5(token => 'U') J4(token => 'U')
08 | J8(token => 'T,U')
09 | 
   +-------------------------------------------------------

...then that means that job J1 and J2 both need a token of type 'T' to run. But, there's only one instance of token T, so J1 and J2 cannot both run at the same time (even though they would, if they didn't rely on tokens). The system will sort jobs alphabetically by name and choose the first in the list. In other words, in this case, J1 will run first and J2 will only run after J1 completes (if no other job has taken the token first). To be more accurate, J1 and J3 will run simultaneously, since J3 does not need any tokens.

To be even more accurate, J1, J3, J4 and J5 will run simultaneously. This is because J4, J5 and J6 all rely on token U, but there are only 2 instances of token U. Even though J6 appears on the line before J5 and J4, the system will choose J4 and J5 first, because they appear first in alphabetical order, and J6 will run after one of the other two have completed.

Because the system always chooses the job with the smallest name (alphabetically), it is possible to experience 'resource starvation' - where a job with a 'larger' name could never get an opportunity to run, because there are too many other jobs with smaller names that get to run first by virtue of their names. Future versions of TaskForest will implement heuristics to prevent resource starvations.

Note that J8 relies on two tokens: T and U. It will only run when it can acquire one of both tokens. If it can acquire one, but not the other, it will release the first and try to acquire both at a later time.

Finally, tokens can also be used to control the load on the machine on which taskforest is running. If you've got several independent jobs that don't depend on each other, but which use a fair amount of resources, you can have all the jobs use the same token. Then you can tweak the maximum number of instances of that token to a value that maximizes the number of simultaneous jobs without putting too much strain on the server.

EXAMPLE 6 - Calendars

A calendar is a set of rules that defines on what days a job may run. The rules that make up a calendar are specified in the configuration file and the calendars themselves are associated with a Family in the Family file.

Calendar names should contain only the characters shown below:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-

Let's start with the configuration file. You can have zero or more calendars in the configuration file. Each calendar may be associated with zero or more Families. A calendar consists of one or more rules. All rules belonging to a calendar are consulted in order to determine whether or not the Family should run today. The rules are consulted in the order in which they are specified in the configuration file. Rules may contradict each other. A rule may not conclusively determine whether or not the Family should run today. The last rule that determines whether or not a Family should run today will override any earlier rules. Rules are case insensitive.

Let's see some examples. First, let's see how a Family file specifies a calendar:

Example 1 - One day only
   +-------------------------------------------------------
01 |start => '00:00', tz => 'GMT', calendar => 'NY2010'
02 |
   | ...

You can see here that instead of the days => '...', we have calendar => 'NY2010'. This tells the system that this family will rely on the calendar named 'NY2010.' This calendar is defined in its own file. The file name should be the same as the calendar name. The directory in which the file exists is specified by the calendar_dir option:

   +-------------------------------------------------------
   | ...
   | calendar_dir = "/foo/bar/calendars"
   | ...

The file /foo/bar/calendars/NY2010 looks like this (the # symbols and anything after them are comments, just like in the configuration file):

   +-------------------------------------------------------
01 | # NY2010
02 | + 2010/01/01  # Only valid on New Years Day, 2010
   +-------------------------------------------------------

This calendar only has one rule. It is "+ 2010/01/01". The '+' in the rule says that if the date specified in this rule matches, then the Family must run on that day. The '+' is optional. This calendar will allow the Family that uses it to run on Jan 1, 2010, and on no other day. Dates in rules should be in the YYYY/MM/DD format.

Example 2 - One month only

What if we want a job to run on every day in November 2010? You can use a rule like this:

# Nov2010
        
 2010/11/*

The '*' in the DD part of the date is like a wildcard. It means that the DD part of the rule will match any number. In other words, if today's date is November DD, 2010, then this rule will match, for all values of DD. Note that the '+' is missing here. That's ok. It's optional, and if missing, the system will assume that you meant to put in a '+'.

Example 3 - Rejecting days

If, on the other hand, you wanted this Families that use this calendar to run on all days in 2010 except all of November, you would use the '-' sign:

# All_But_Nov2010
        
 + 2010/*/*
 - 2010/11/*

The first line matches all days in 2010. The MM and DD parts are both wildcards. The optional plus tells the system that if the date matches this pattern, it should count as a valid run date. The next line, on the other, hand adds an exception to this rule: If the date falls within November 2010, the date should not be a valid run date - note the '-' sign that tells the system to exclude this date.

Example 4 - Daily Calendar

To specify a daily calendar, use this:

# Daily

*/*/*

Of course, you don't have to name the calendar 'Daily.' You can name it whatever you want. Using this calendar is equivalent to having days=>'Mon,Tue,Wed,Thu,Fri,Sat,Sun' in the Family file.

Example 5 - Specifying days of the week

You can also specify rules that specify days of the week with a qualifier. For example, to run a job on the first Monday of every month in 2009, you should use a rule like this:

# FirstMon09

+ first Mon 2009/*

A couple of points to mention here: First, the '+' is optional here as well. Second, the date part of the rule only has the year and the month (in YYYY/MM format). When you use a qualifier like 'first,' it makes no sense to say things like 'The first Monday of the 1st of every month.'

The word 'first' in line 2 above is called a qualifier. Valid qualifiers are:

The qualifiers 'first last,', 'last last' and 'every last' also work in version 1.25, but they may stop working in a future version, so don't get into the habit of using those qualifiers.

Unlike the 'days' specifier in the family file, the days of the week in calendar rules may be spelled out. Only the first 3 characters are significant.

Calendar Recipes

The following 'recipes' show you some useful calendar rules:

# ############################################################
# Run every day
#
+ */*/*
# ############################################################
# Run on weekdays only
#
*/*/*
- every Saturday */*
- every Sunday */*

# You could also replace the 3 lines above
# with 5 '+' lines, one for each weekday.
# ############################################################
# 'Thanksgiving Day' observed in the U.S.
#
fourth Thursday */11
# ############################################################
# 'Thanksgiving Day' observed in Canada
#
second Monday */10
# ############################################################
# 'Memorial Day' observed in the U.S.
#
last Monday */5
# ############################################################
# The day Daylight Saving Time starts in the U.S.
#
second Sun */03  # this rule is valid for dates
                 # after 2007, but not earlier
# ############################################################
# The day Daylight Saving Time ends in the U.S.
#
first Sun */11   # this rule is valid for dates
                 # after 2007, but not earlier
# ############################################################
# The day Daylight Saving Time starts in Europe
#
last Sun */03    # tested for 2009
# ############################################################
# The day Daylight Saving Time ends in Europe
#
last Sun */10    # tested for 2009