I have written some software which facilitates data reduction or other tasks
on the Titan cluster (or on any other pvm). See
man pvmloop for more info.
pvmloopyou need to start a pvm. This in turn requires setting up environment variables and modifying your
.rhostsfile. This is documented in a general manner in pvm3 users guide and reference manual but can be a bit fiddly, so I have described the procedure in detail below.
In what follows I will assume that you are running on cronus, and that
your shell is
tcsh (to see what your
shell is, do
pvm is installed in
make sure this is in your path and do
which pvm to check
you can get to it.
IMPORTANT: this, and other parts of the path to be described
in a moment, will also need to be in your path on the slaves when
remote processes get executed on them. This means that you MUST
set your path in your shell rc
rather than in your
.login file, since the latter does not
get sourced when eg
rsh is used.
Also, watch out for things being sourced conditionally depending on whether it is a login shell for example.
If in doubt, to check what the actual path is on the slaves, do
rsh node01 'echo $path'
(be sure to use single quotes here). However, note that the enviromnent inherited by processes spawned by pvm is that in effect when the pvm daemons are started, so if you change your path definition in your shell rc files, for example to add the path to some needed software, then you will need to 'halt' your pvm and start it afresh.
Second, you need to define the environment variables
PVM_ARCH by inserting the lines
setenv PVM_ARCH SUN4SOL2
setenv PVM_ROOT /usr/local/pvm3
setenv PVM_ARCH SUN4SOL2
in in your shell rc file.
Following this, you should insert lines to add the following to your path:
where the latter will contain your private pvm executables.
pvm needs to be able to execute processes remotely, so you
need to have a
.rhosts set up on all of the machines you intend
to use. For the titan cluster, our home directories are the same for all
the machines, so we only need to create one. Mine contains the following lines:
you will want to replace my username with yours!
hyperion = node01etc.
To check that this works, try executing something like
rsh node01 date
cronus. You should see the usual
pvm. You should get a prompt like
if you type
conf you will see the configuration of the
pvm - right now it just contains
Now try to 'enroll'
node01 in the pvm by typing
add node01 at the
If you get an error message like
node01 Can't start pvmd
it probably means your path is not set up properly for
commands on the host. If the
add command claims to be
conf to see the new configuration.
Please refer to the excellent pvm3 users guide and reference manual for more information on installing and running pvm or writing programs to use the pvm library.
pvm>prompt to kill the pvm, and then restart it with
and check that all 8 nodes have been successfully enrolled by
conf as before.
You can now exit from the pvm command interface bu typing
quit, which leaves the virtual machine running.
A couple of useful pvm command line commands are
ps -a to see all
of the pvm processes running, and
kills them all, but leaves the pvm running.
pvmloopman page, you will see that it allows you execute repetitive commands over the pvm. To do this, it uses two auxiliary commands
topvm. These must live in your personal
cp ~kaiser/pvm3/bin/SUN4SOL2/pvmloop ~/pvm3/bin/SUN4SOL2/
and similarly for
With luck you are all nearly ready to go. The final thing you need
to do is create a file
uses to figure out how many slaves to use and associates
with each node a 'node string' which is used primarily to
generate the names of the slaves' local scratch disks.
As a test, go into some convenient temporary directory,
copy over the file
which defines tells
pvmloop to use the
node01 ... node06 with associated node strings
01 ... 06 . Then execute the command
pvmloop 6 'df -k /d%N'
pvmloop does is to start up a set of
processes on each of the slaves. It then
takes the 'command string template'
provided as its second argument
and generate copies of this, with the special substring '%N' replaced
by the node string. It then sends these strings to the
processes using the pvm message passing machinery and these
processes then use the
system() command to have these
executed. It does this in such a way that the
stderr of the child processes are send back to the
pvmloop process running on the master to be
merged into the standard output streams of the
pvmloop process. Finally,
pvmserver processes are sent a message telling them to
Since the scratch disks on the titan cluster
/d01 /d02 ... this example
should tell you how much disk space is free on
each of the scratch disks on the 6 slaves.
Please refer to the
pvmloop man page for more information
and how to use the other 4 special codes %n, %i, %I and %%.
pvmmonitorthat checks the CPU use on the slaves periodically. You can get a copy of it from
You may want to inspect some of the scripts in e.g.
~kaiser/MACS/subaru1200/phase1/scripts to get an idea on how to use
commands to reduce your data (though
pvmloop will work happily with any
other software that can be executed from scripts.)
ps -a in the
pvm command line interface
will only report the presence of the
pvmserver processes, and not the
actual tasks spawned by these servers, which get put in the background. Thus, for example,
if you kill a
pvmloop process and then use pvm's reset command to kill
the servers, the actual data processing tasks may still be running. This can lead to
serious side effects if you restart the process. I tend to use
such circumstances to figure out when it is safe to restart the process.
There is naturally something of a learning curve involved in developing
scripts to handle data reduction using a pvm. In this regard, note that
it is not necessary or desirable to book exclusive use of the titan cluster
while experimenting or debugging scripts. If you explore some of the scripts
mentioned above you will see that all details about the format of the
CCD mosaic camera, size of chips, location of scratch directories etc
are externalized in database files (suffix
.db) which get
'required' by the scripts as necessary. If you follow this approach
then it is easy to set up a set of db files for a set of small test
images that can be used to test one's procedure without seriously
loading the cluster slaves' CPU or scratch disk capacity.
Note also that it is very easy to install a private version of PVM on any workstation, and that even a single workstation installation can be used to simulate the behaviour of the titan cluster (though of course with less processing power).