Organized high-throughput calculations: job-folders

Pylada provides tools to organize high-throughput calculations in a systematic manner. The whole high-throughput experience revolves around job-folders. These are convenient ways of organizing actual calculations. They can be though of as folders on a file system, or directories in unix parlance, each one dedicated to running a single actual calculation (eg launching VASP once). The added benefits beyond creating the same file-structure with bash are:

  1. the ability to create a tree of folders/calculations using the power of the python programming language. No more copy-pasting files and unintelligible bash scripts!
  2. the ability to launch all folders simultaneously
  3. the ability to collect the results across all folders simultaneously, all within python, and with all of python’s goodies. E.g. no more copy-pasting into excel by hand. Just do the summing, and multiplying, and graphing there and then.

Actually, there are a lot more benefits. Having everything - from input to output - within the same modern and efficient programming language means there is no limit to what can be achieved.

The following describes how job-folders are created. The fun bits, launching jobs, collecting results, manipulating all job-folders simultaneously, can be found in the next section. Indeed, all of these are intrinsically linked to the Pylada’s IPython interface.

Prep

First off, we will need a functional. Rather that use something heavy, like VASP, we will use a dummy functional which does pretty much nothing... Please copy the following into a file, any file, which I recommend to call dummy.py. Putting it into a file is important because we will want python to be able to refer to it later on.

def functional(structure, outdir=None, value=False, **kwargs):
  """ A dummy functional """
  from copy import deepcopy
  from pickle import dump
  from random import random
  from pylada.misc import Changedir

  structure = deepcopy(structure)
  structure.value = value
  with Changedir(outdir) as pwd:
    with open('OUTCAR', 'w') as file: dump((random(), structure, value, functional), file)
  return structure

This functional takes a few arguments, amongst which an output directory, and writes a file to disk. That’s pretty much it.

Creating and accessing job-folders

Job-folders can be created with two simple lines of codes:

>>> from pylada.jobfolder import JobFolder
>>> root = JobFolder()

To add further job-folders, one can do:

>>> jobA = root / 'jobA'
>>> jobB = root / 'another' / 'jobB'
>>> jobBprime = root / 'another' / 'jobB' / 'prime'

As you can, see job-folders can be given any structure that on-disk directories can. What is more, a job-folder can access other job-folders with the same kind of syntax that one would use (on unices) to access other directories:

>>> jobA['/'] is root
True
>>> jobA['../another/jobB'] is jobB
True
>>> jobB['prime'] is jobBprime
True
>>> jobBprime['../../'] is not jobB
True
>>> root['..']
KeyError: 'Cannot go below root level.'

Furthermore, job-folders know what they are:

>>> jobA.name
'/jobA/'

What their parents are:

>>> jobB.parent.name
'/another/'

And what the root is:

>>> jobBprime.root is root
True
>>> jobBprime.root.name
'/'

They also know what they contain:

>>> 'prime' in jobB
True
>>> '/jobA' in jobBprime
True

Making a job-folder executable

The whole point of a job-folder is to create an architecture for calculations. Each job-folder can contain at most a single calculation. A calculation is setup by passing to the job-folder a function and the parameters for calling it.

>>> from pylada.crystal.binary import zinc_blende
>>> from dummy import functional
>>>
>>> jobA.functional = functional
>>> jobA.params['structure'] = zinc_blende()
>>> jobA.params['value'] = 5

In the above, the function functional from the dummy module created previously is imported into the namespace. The special attribute job.functional is set to functional. Two arguments, structure and value, are specified by adding the to the dictionary job.params. Please note that the third line does not contain parenthesis: this is not a function call, it merely saves a reference to the function with the object of calling it later. ‘C’ aficionados should think a saving a pointer to a function.

Warning

The reference to functional is deepcopied: the instance that is saved to jod-folder is not necessarily the one that was passed to i. On the other hand, the parameters (jobA.params) are held by reference rather than by value.

Tip

To force a job-folder to hold a functional by reference rather than by value, do:

>>> jobA._functional = functional

The parameters in job.params should be pickleable so that the folder can be saved to disk later. functional must be a pickleable callable. Setting functional to something else will immediately fail. In practice, this means it can be a function or a callable class, as long as that function or class is imported from a module. It cannot be defined in __main__, e.g. the script that you run to create the job-folders:

>>> run -i jobscript.py # functional must defined outside jobscript.py.

However, if jobscript is imported as a module, and the job-folders are created via a function, then functional can be defined inside jobscript.py:

>>> import jobscript
>>> newjobs = jobscript.create_my_jobfolders() # functional can be defined in jobscript.py

These complications are due to the way python pickles data. And pickling we need to save job-folders to disk. The functional is called with the parameters passed to the folder as keyword arguments:

>>> jobA.compute(outdir=jobA.name[1:])

is exactly equivalent to:

>>> functional(structure=zinc_blende(), value=5, outdir='jobA')

Note that we have passed an extra argument outdir, which is the output directory. It is customary to set it to the name of the job (minus the leading /). Any one of the two previous commands will create a “JobA” sub-directory in the current directory.

Tip

Executable olders can be iterated the same way dictionaries can, with keys(), iterkeys(), values(), itervalues(), items(), iteritems().

Saving and loading folders

The IPython interface provides better ways to both. However, it is still possible to load and save job-folders to disk from a script:

>>> from pylada.jobfolder import load, save
>>> save(root, 'root.dict') # saves to file
>>> root = load('root.dict') # loads from file

The file format is a pickle. It is not meant for human eyes. However, it can be transferred from one computer to the next. The parameters job.params should be pickleable, as well as the functional, for this to work. The advantage of using these two functions is that they take care of locking access to file on-disk before reading or writing to it. This way, multiple processes can access the file without fear of getting into one another’s way.

Tip

If either load or save takes for ever, check whether the lock-directory ”.filename-pylada_lockdir” exists. If you are sure that no other process exists which is trying to access the file on disk, then you can delete the lock-directory and try saving/loading again. Alternatively, a timeout argument can be provided to raise an exception if the file cannot be locked.