Warning: It was at this point of access permissions, remote shells, ... that the expertise base of the hrothgar crew essentially vanished (and, coincidentally, the local gurus all left town). The following procedures work in the most basic sense that distributed executions under MPICH work. However, they would be awkward for a very large Beowulf, and have a few other shortcomings. Refinements on these initial steps will be attempted once the real experts return from Scandanavia.
Contents:
The NFS file strategy described in the Preliminaries document has the /scratchNM partitions on all PCs accessible to all other PCs, with /scratch00 on the Master PC earmarked for user home directories, and the remaining /scratchNM partitions intended for data staging and collection. This scheme is enabled by edits (done as root) of the files /etc/exports and /etc/fstab. The edited files on machine dan01, for example, are as follows:
/etc/exports on dan01
/scratch01 192.168.8.40(rw,no_root_squash) /scratch01 192.168.8.42(rw,no_root_squash) /scratch01 192.168.8.43(rw,no_root_squash) /etc/fstab on dan01
/dev/hda1 / ext2 defaults 1 1 /dev/hda5 /scratch01 ext2 defaults 1 2 /dev/fd0 /mnt/floppy ext2 noauto 0 0 /scratch01 /scratch01 nfs rw,user,unhide 0 0 dan00:/scratch00 /scratch00 nfs rsize=8192,wsize=8192,hard,intr 0 0 dan02:/scratch02 /scratch02 nfs rsize=8192,wsize=8192,hard,intr 0 0 dan03:/scratch03 /scratch03 nfs rsize=8192,wsize=8192,hard,intr 0 0 none /proc proc defaults 0 0
The lines in /etc/exports make the scratch partition on dan01 accessible to all other PCs within hrothgar, and the no_root_squash option lets root on one node operate as root on any of the mounted file systems. The "danNM:/scratchNM" entries in /etc/fstab explicitly mount the exported file systems from the other nodes.
The exports and fstab files on the other PCs have corresponding entries. To actually export the file system on a given PC, either reboot the machine or execute the command /usr/sbin/exportfs.
Finally, on all of the non-root machines, the command
Confession: the "solution" to remote permissions described here is too simplistic. Eventually, we will adopt something closer to the pam-based procedures described in the "official" Caltech/CACR Beowulf Building Tutorial. Again, however, the initial procedures were good enough to get MPICH up and running.
The first step was to list all nodes in the /etc/hosts file on each machine. For example, the /etc/hosts file on the first machine reads:
(with aa.bb.cc replaced by the appropriate full domain extensions).127.0.0.1 localhost localhost.localdomain 192.168.8.40 dan00.aa.bb.cc dan00 hrothgar hrothgar.aa.bb.cc 192.168.8.41 dan01.aa.bb.cc dan01 192.168.8.42 dan02.aa.bb.cc dan02 192.168.8.43 dan03.aa.bb.cc dan03
Next, a .rhosts file was set up in the home directory of all users (including root) with contents:
dan00.aa.bb.cc dan01.aa.bb.cc dan02.aa.bb.cc dan03.aa.bb.cc hrothgar.aa.bb.cc
Warning: The initial cut here was really a kludge, consisting of a series of awkward steps:
/user/sbin/adduser usernameon each of the other processors.
The four hrothgar PCs each have 128 Mbytes of memory, but the systems originally only "saw" 64 Mbyte. This was fixed by a simple "append" entry in the configuration file /etc/lilo.conf, which was modified to
boot=/dev/hda map=/boot/map install=/boot/boot.b prompt timeout=50 image=/boot/vmlinuz-2.0.34-0.6 label=linux root=/dev/hda1 read-only append="mem=128M"