TODO:

- actually start using grun in batch mode for day-to-day DAS2/GridLab
  sanity checks.

- Add firewall info to griddefs.py and pass it on to the jobs via the
  environment; to test it out I need NetIbis with Alexandre's extensions
  and some examples (feedback from Rutger)

- If gridrun dies for some reason, user may still want to kill jobs based
  on their urls in status files.

- May want co-scheduling retries within the tool rather than breaking
  off run and reporting failure.
  Issues:
  => interaction with startup of user daemons
  => may want an entirely fresh run (new globus io servers as well)

- for MPICH-G2 runs, fallback to external globusrun is still useful?

- Python output / PIPE-EOF interaction?

- Scheduling support:
  - select sites based on node count, preference, etc
  - option grab as much as possible (dangerous..)


TODO (but not grun's fault):

- Currently fs0's pbs_mom deamons not always survive job cancellation :-(
  Find out if/how this can be avoided.

- To use nodes as well as cpus, the Globus PBS-jobmanager scripts need
  fixing elsewhere as on DAS-2.
  Investigate further, reproduce with globus-job-run, and ask PBS sites
  with multi-processors to fix it.

- Option -N <num> on fs0/PBS sometimes causes Ibis nameserver connection
  problems, while -H <num> works fine?

- PBS output is currently only available at end of run.
  Fixable somehow, or inherent limitation?
  For the moment user can redirect output to local file in sandbox,
  which can be retrieved remotely.


DONE:

- make command lines smaller, e.g., by means of environment
  parameters and more useful defaults.

- add Globus libs to default search paths so that updating PYTHONPATH
  is no longer necessary for DAS users.

- if stagein phase completes, but unsuccesfully (non-zero exit),
  run phase should not be done
  => GRAM limitation; have to hack around it in runit.ibis

- when building RSL from commandline, take special characters into account
  e.g. -c 'prog opt=value' may need to be
     "(executable=prog)(arguments=opt # '=' # value)"

- also create job/site dirs when not starting separate I/O servers,
  since the remote commands may still want to use the single GASS server,
  and also the job id needs to be reserved.

- do regular runs with matrix.sara.nl
  - jar fails?!
  - for the moment start a pool/nameserver by hand on ui.matrix,
    but really should add "gateway process" option to grun.

- One subjob terminates => all should terminate within timeout

- Optionally cleanup remote sandboxes after run =>
  currently implemented by means of an "rm" in the stageout phase
  when GRUN_CLEANUP is set.

- As soon as all subjobs have started, startup timer should be
  cancelled rather than run through the end; currently the remaining
  time is effectively added to the wall timeout.

- Test out scenario where whole Ibis directory tree is wrapped
  up in a jar file, staged in, and unpacked remotely, before actual run.
  Also pass name of the jar and class to the job

- Add support for (jobmanager-specific) rsl extensions, e.g., for skirit
  to use non-default queue to get more nodes than the default 2..

- Actually test all sites in griddefs.py
  => *many* network connectivity issues

