To run this example, pick a name of some directory for testing, say testa. Then:
cp -r schedMain/example.static testa
cd testa
.../schedMain.py -globalDir global -ancDir . -initWork initWork -delaySec 1 -redoAll n
The command line parameters are:
-bugLev <string> Debug level. Typically 0, 1, or 5. -hostType <string> System type: hostLocal or peregrine or ... -globalDir <string> Dir containing global info, including subdir cmd -ancDir <string> An ancestor dir of all dirs to be processed -initWork <string> File containing the initial work list -delaySec <string> Schedule loop delay, seconds -redoAll <bool> n/y: on restart, redo all even if prior run was ok -useReadOnly <bool> n/y: only print status; do not start tasks
The possible task status values are:
init The task is on the work list but not yet started. Either it was just added to the work list, and soon will start, or it has unsatisfied prerequisites.
submit The task has been submitted to the HPC via qsub, msub, or similar, but has not yet been recognized by the HPC.
wait The task has been submitted to the HPC via qsub, msub, or similar, but has not yet started.
start The task has started
ok The task finished successfully and wrote the file taskName.status.ok.
error The task finished but with an error. Generally the error message is in file taskName.status.error.
In this example you should see output like the following. The “#” notes are mine, after the fact:
# This is the initial work list. Here schedMain has
# just read the file initWork.
schedMain
task counts: init:1
execName jobId new status npre taskDir
-------- ----- --- ------ ---- -------
alpha.sh None new init * 0 aaDir
scheduleTasks: start task: alpha.sh taskDir: aaDir
# After alpha.sh completes, the work list is as follows.
# SchedMain noticed the file alpha.status.ok, read alpha.postOkWork,
# and added the new tasks to the work list.
# The "npre" column is the number of unsatisfied prerequisites.
# Here gamma cannot start until the 3 betas complete.
# The "*" indicates that task is ready to start.
schedMain
task counts: init:4 ok:1
execName jobId new status npre taskDir
-------- ----- --- ------ ---- -------
alpha.sh None new ok 0 aaDir
beta.py None new init * 0 bbDir0
beta.py None new init * 0 bbDir1
beta.py None new init * 0 bbDir2
gamma.py None new init 3 ccDir
# Schedmain starts all the ready tasks -- the three betas.
# Gamma cannot start yet since its prerequisites, in gamma.preWork,
# are the betas.
scheduleTasks: start task: beta.py taskDir: bbDir0
scheduleTasks: start task: beta.py taskDir: bbDir1
scheduleTasks: start task: beta.py taskDir: bbDir2
# As soon as the betas start they finish,
# and finally gamma's pre-requisites are satisfied.
scheduleTasks: start task: gamma.py taskDir: ccDir
# All done.
# If schedMain ends and some tasks have "init" status,
# most likely it's because their prerequistites aren't
# satisfied -- perhaps some prior task failed.
schedMain
task counts: ok:5
execName jobId new status npre taskDir
-------- ----- --- ------ ---- -------
alpha.sh None new ok 0 aaDir
beta.py None new ok 0 bbDir0
beta.py None new ok 0 bbDir1
beta.py None new ok 0 bbDir2
gamma.py None new ok 0 ccDir
The “new” notation means that the task actually ran. If you start the scheduler again in this directory, with the same command as above, you will see:
schedMain
task counts: ok:5
execName jobId new status npre taskDir
-------- ----- --- ------ ---- -------
alpha.sh None ok 0 aaDir
beta.py None ok 0 bbDir0
beta.py None ok 0 bbDir1
beta.py None ok 0 bbDir2
gamma.py None ok 0 ccDir
Notice the lack of “new” flags. The scheduler found the x.status.ok file for each of the tasks and concluded the task did not need to be rerun.
If you want to force the scheduler to rerun all tasks even if they completed OK, specify the command line flag -redoAll y.