Configuring Pylada

Environment Variables

PYLADA_CONFIG_DIR

Environment variable specifying the path(s) to the configuration directories.

PYLADA_DATA_DIRECTORY

Environment variable specifying the path to the root of the data directories.

PYLADA_TMPDIR

Optional environment variable specifying the path to the root of temporary directories Pylada might need to create.

Configuration variables

Configuration variables exist in the pylada module itself. However, they can be added within separate files. Which files will depend upon the user.

  • Files located in the config sub-directory where pylada is installed
  • Files located in one of the directories specified by PYLADA_CONFIG_DIR
  • In the user configuration file ~/.pylada

Each file is executed and whatever is declared within is placed directly at the root of the pylada package. The files are read in that order. Within a given directory, files are read alphabetically. Later files can override previous files.

General

pylada.verbose_representation

Whether functionals should be represented/printed verbosely, e.g. each and every attribute, or whether attributes which have not changed from the default should be stripped. The former is safer since it should defaults may change over time, and the representation can become inaccurate. Defaults to False.

pylada.ipython_verbose_representation

When in ipython and if not None, then changes verbose_representation to this value. Makes it a bit easier on the eyes in ipython, while keeping things accurate during actual calculations. Ignored if None. Defaults to False.

Note

Only taken into account at ipython start-up. It is ignored if Pylada is launched within python.

CRYSTAL

These are the variables generally declared in config/dftcrystal.py

pylada.crystal_inplace

Whether calculation should be runned directly in the output directory, or in a temporary directory. The latter case avoid clutter, as only a small set of files are copied back to the output directory. Note that some files (notably “crystal.out”) are created in the output directory from start, and linked to the temporary directory. As such, these files will always be there, even if a job is forcefully killed before Pylada has had a change to copy things back. If crystal_inplace is False, then the files are placed in a temporary directory. This temporary directory is itself located within PYLADA_TMPDIR (if the environment variable exists), or within PBS_TMPDIR (if that exists), or in the default temporary directory of the system. A special link workdir will be created within the output for the duration of the crystal run.

pylada.crystal_program

It can be a string defining the path to the serial CRYSTAL program. Or it can be a callable which takes three arguments and returns a string to the appropriate CRYSTAL program. It is the following by default.

The three arguments are meant to describe the job for which CRYSTAL is lauched as accurately as possible. The first, self, will be the functional making the call, or None. The second, structure, will be the crystal structure to be computed, or None. The third, comm, is a dictionary defining the MPI call, if an MPI call. It contains, for instance, the number of processors on which CRYSTAL should be runned.

Note

It is important that the path to the serial code be returned when self is None, as it allows Pylada to perform the crystalline transforms using CRYSTAL directly, and hence to interpret a crystal structure exactly as CRYSTAL would.

pylada.properties_program

A string defining the path to the properties program. By default, it is “properties”, expecting the program to be in the path.

VASP

These variables are generally declared in config/vasp.py

pylada.is_vasp_4

If it exists and is True, some vasp parameters will fail if used with vasp-5 only options. If it does not exist or is false, then these parameters are allowed.

pylada.vasp_program

Signifies which vasp executable to use. It can take the following values:

  • string: Should be the path to the vasp executable. It can be either a full path, or an executable within the environment’s $PATH variable.

  • callable: The callable is invoked with a Vasp instance as its first argument, the structure upon which the calculation is performed as its second, and the communicator as its last. It should return a string, as described above. In other words, different vasp executables can be used depending on the type of calculation and on the system.

    For instance, the following function chooses between a normal vasp and vasp compiled for perturbative spin-orbit calculations:

    def vasp_program(vasp, structure, comm):
      """ Path to the vasp executable.
    
          Returns a vasp compiled for spin-orbit if lsorbit is True.
          Otherwise, returnthe path to the normal vasp.
      """
      return "vasp-4.6-nc" if getattr(vasp, 'lsorbit', False) == True else "vasp-4.6"
    
pylada.vasp_has_nlep

Defaults to False. If NLEP [*] should be allowed, then this parameter should be set to True.

[*]Phys. Rev. B 77, 241201(R) (2008)

External programs

These variables are generally declared in config/mpi.py

pylada.mpirun_exe

Format string to launch mpi programs. It accepts as arguments programn, ppn as well as anything you want to throw at it:

  • program: program (with commandline arguments) to launch
  • placement: used to place an executable on specific nodes and processors
  • n: number of processes to launch program
  • ppn: number of processes per nodes

In general, it takes the following form:

mpirun_exe = "mpirun -n {n} {placement} {program}"

The actual commandline is executed by launch_program(). The latter executes via Popen a commandline obtained through the format method of a python string. The arguments to format are those mentioned above as well as anything passed on to launch_program().

pylada.default_comm

An dictionary with n and ppn, as well as any other variable to be used in conjunction with mpirun_exe.

pylada.do_multiple_mpi_programs

A boolean defining whether to attempt to figure out which machines pylada can run on. This is only necessary if you will run different mpi executable simultaneously in the same PBS job.

pylada.figure_out_machines

A string which defines a python program to get the hostnames of the machines Pylada can run on. This program must print to the standard output the names of the machines, one per line, and nothing else. Defaults to:

figure_out_machines =  'from socket import gethostname\n'                 \
                       'from boost.mpi import gather, world\n'            \
                       'hostname = gethostname()\n'                       \
                       'results = gather(world, hostname, 0)\n'           \
                       'if world.rank == 0:\n'                            \
                       '  for hostname in results:\n'                     \
                       '    print "PYLADA MACHINE HOSTNAME:", hostname\n' \
                       'world.barrier()\n'
pylada.modify_global_comm(pylada.process.mpi.Communicator) → None

Called after figuring the hostnames of the nodes Pylada should run on. It is a callable tacking the global communicator as sole input. It should modify the callable such that placement can make sense of a communicator and issue the correct placement configuration to the mpirun program. By default, this function does nothing.

pylada.placement(pylada.process.mpi.Communicator) → str

Callable which takes an Communicator and returns a string which tells the mpirun program which nodes to run on. The string is substituted for “{placement}” in mpirun_exe. In most cases (e.g. default), this means writing a machine file to disk and telling mpirun where it is with “-machinefile”.

Job-folder

pylada.jobparams_readonly

Whether instances of ForwardingDict are read only by default. In practice, objects which use forwarding dictionaries generally dictate whether it should read-only or not, depending on what these objects do. This parameter should presently not have any effect.

pylada.jobparams_naked_end

Whether mass collectors and manipulators, such as JobParams should return an object as is, rather than a ForwardingDict, when it is the only item left. Practical when checking results in ipython, not so much when writing scripts.

pylada.jobparams_only_existing

Whether, when setting parameters with JobParams, new attributes should be created for those items which do not possess that attribute, or whether JobParams should content itself with only modifying pre-existing attributes. Beware if set to True.

pylada.unix_re

Whether mass collectors and manipulators, such as JobParams, accept regex as indices, or whether to use bash-like substitutions. The former is more powerfull, and the latter much simpler.

Computational ressources and job submission

pylada.qsub_exe

Path to qsub. Can be relative. Defaults to “qsub”.

pylada.qsub_array_exe

A format string to launch PBS arrays.

If not None, if should be a tuple consisting of the command to launch job arrays and the name of the environment variable holding the job index.

>>> qsub_array_exe = 'qsub -J 1-{nbjobs}', '$PBS_ARRAY_INDEX'

The format {array} will receive the arrays to launch.

Note

Slurm does not do job-arrays.

pylada.pbs_string

String from which to create pbs/slurm submission scripts. For instance, the following is for the slurm ressource manager:

pbs_string = "#! /bin/bash/\n"                  \
             "#SBATCH --account={account}\n"    \
             "#SBATCH --time={walltime}\n"      \
             "#SBATCH -N {nnodes}\n"            \
             "#SBATCH -e {err}.%j\n"            \
             "#SBATCH -o {out}.%j\n"            \
             "#SBATCH -J {name}\n"              \
             "#SBATCH -D {directory}\n\n"       \
             "python {scriptcommand}\n"

There are number of keywords which should appear:

  • walltime: defines how long the job should run. It will generally be provided when calling launch in ipython.
  • n: The number of processes to request from the resource manager.
  • nnodes: The number of nodes to request from the resource manager. Generally, it will be generated automatically from n and default_pbs‘s relevant information.
  • err: A file where to log errors from this job. This filename will be generated automatically.
  • out: A file where to log output from this job. This filename will be generated automatically.
  • name: The name of the job. Also generated automatically.
  • directory: The directory where the job will take place. Also generated automatically.
  • scriptcommand: You do want something to happen, right? Generated automatically.
  • account: Relevant to slurm only. Selected by user when launching job.

Any number of parameters can be further provided, as long as they exist in default_pbs.

pylada.default_pbs

A dictionary which contains the parameters relevant to pbs_string. Additionally, it should contain:

  • ppn: Number of processes per node.
pylada.debug_queue

How to select the debug queue. First part of the tuple is the keyword argument to modify when calling the pbs job, and the second is its value.

pylada.ipython_qstat()

An ipython magic function which returns all jobs submitted by the user. Once provided, it will be automatically imported into the ipython session by the pylada extension, where is called qstat. This will change somewhat from one supercomputer to the next, depending on the type of ressource manager it uses. Here is what the function looks like for slurm:

def ipython_qstat(self, arg):
  """ squeue --user=`whoami` -o "%7i %.3C %3t  --   %50j" """
  from subprocess import Popen, PIPE
  from IPython.utils.text import SList
  from getpass import getuser

  # finds user name.
  whoami = getuser()
  squeue = Popen(["squeue", "--user=" + whoami, "-o", "\"%7i %.3C %3t    %j\""], stdout=PIPE)
  result = squeue.stdout.read().rstrip().split('\n')
  result = SList([u[1:-1] for u in result[1:]])
  return result.grep(str(arg[1:-1]))

An this one is for the PBSpro ressource managers:

def ipython_qstat(self, arg):
  """ Prints jobs of current user. """
  from subprocess import Popen, PIPE
  from IPython.utils.text import SList
  # get user jobs ids
  jobs   = SList(Popen(['qstat', '-f'], stdout=PIPE)                           \
                .communicate()[0].split('\n'))
  names  = [ u[u.find('=')+1:].lstrip().rstrip()                               \
             for u in jobs.grep('Job_Name') ]
  mpps   = [int(u[u.find('=')+1:]) for u in jobs.grep('Resource_List.ncpus')]
  states = [ u[u.find('=')+1:].lstrip().rstrip()                               \
             for u in jobs.grep('job_state') ]
  ids    = [u[u.find(':')+1:].lstrip().rstrip() for u in jobs.grep('Job Id')]
  return SList([ "{0:>10} {1:>4} {2:>3} -- {3}".format(id, mpp, state, name)   \
                 for id, mpp, state, name in zip(ids, mpps, states, names)])

It use IPython’s SList to easily grep through output.

Other/better snippets for other ressource managers are welcome.

pylada.queues

List of strings defining the queues accessible to the users. They will be made available in %lauch. It can be an empty tuple if “queues” are not relevant to the ressource manager.

pylada.accounts

List of strings defining the accounts accessible to the users. They will be made available in %lauch. It can be an empty tuple if “accounts” are not relevant to the ressource manager.

Table Of Contents

Previous topic

Python API

Next topic

crystal

This Page