Hi,

This is a summary of the meeting Wayne, Bryan and I had last week about 
	1) general issues regarding how we create the pbs scheduler files in cipres and 
	2) how to correctly set ppn in light of the recent OS upgrade on gordon.  
Regarding the first point, we agreed submit.py should implement a few straightforward rules 
to create the pbs run script based on the info in scheduler.conf and properties of the 
compute resource that the job will run on.  I will slightly rewrite submit.py to remove 
special cases for individual tools, according to the psuedo code below.  The outcome of our 
discussion of the second point (regarding ppn) is captured in the psuedo code at the end.

Here's what we decided we need in the scheduler.conf files:

    jobtype  
		(i.e. does the job use mpi or not, default is "not mpi".  
		"direct" is another option meaning that submit.py doesn't 
		create the pbs script, eg. raxmlLight.)
    mpi_processes 
		(the total number of mpi processes aka "np", default is 1)
    threads_per_process  
		(default is 1)
    node_exclusive 
		(indicates that a shared type queue should not be used, default is false)
    runhours 
		(user specified maximum walltime)

This is the same as what we have at the moment except that "nodes" has been removed and 
"node_exclusive" has been added to handle cases where mpi_processes times threads_per_process 
may be less than a full node's worth of cpus, but we want a full node to ourselves, in a 
non shared queue (e.g we may need all the memory). 

One of the problems we have been having when we talk about how a tool is to be run is that 
Mark has been the one creating the pise xml files that generate the scheduler.conf and I've 
been the one maintaining submit.py, so unfortunately when Wayne and Bryan describe the way a 
tool is to be run in terms of things like "ppn", "np", and "queue", Mark needs to ask me what 
he has to put in scheduler.conf to make that happen and neither Mark nor I can easily verify, 
on our own, that we have in fact implemented what Wayne or Bryan recommended.   I think we can 
make this work more smoothly and reliably going forward if we specify all of the following 
whenever we talk about how a tool is to be run:

    tool (e.g. mrbayes 3.2.1)
    machine (e.g. gordon)
    user input condition (eg. nruns x nchains <= 8)
    environment (any "run specific" env vars that need to be set, eg. OMP_NUM_THREADS)
    *jobtype
    *mpi_processes
    *threads_per_process
    *node_exclusive
    queue
    nodes
    ppn

The columns in bold (*) correspond to what will the pise xml file will put in scheduler.conf, 
and the last three columns indicate what submit.py should put in the pbs run script.  Mark -- 
is there anything you'd like to add to this? 

Finally, here's the logic submit.py will use to determine "queue", "nodes" and "ppn":

number_of_cpus = mpi_processes * threads_per_process

# shared_queue_max_cpus is 8 on gordon, 31 on trestles, increment is 8 on gordon, 1 on trestles,
# cpus_per_node = 16 on gordon, 32 on trestles

shared_queue_cpu_increment = get_cpu_increment(shared_queue, host)
shared_queue_max_cpus = cpus_per_node - shared_queue_cpu_increment

# Note that if cpus requested (i.e number_of_cpus) is between the max for the shared Q and 
# a full node's worth, we're opting to use a full node in the regular queue.
if (not node_exclusive) and (number_of_cpus <= shared_queue_max_cpus)
{
    queue = shared_queue
    number_of_nodes = 1
    ppn = shared_queue_cpu_increment * round_up(number_of_cpus / shared_queue_cpu_increment)
} else
{
    queue = regular_queue
    number_of_nodes = round_up(number_of_cpus / cpus_per_node)
    ppn = cpus_per_node
}

# things like QOS that need to be set via "pbs -v"
pbs_env_vars = get_pbs_env_vars(queue, host)

Terri