System-Level Configurations (Hackerware Version)

At this point, all PCs in the hrothgar cluster are running Linux and sending packets out to the switch. Before loading the MPICH communications software and attempting the first distributed programs, a number of changes/additions are needed in the system configuration files on the individual PCs.

Warning: It was at this point of access permissions, remote shells, ... that the expertise base of the hrothgar crew essentially vanished (and, coincidentally, the local gurus all left town). The following procedures work in the most basic sense that distributed executions under MPICH work. However, they would be awkward for a very large Beowulf, and have a few other shortcomings. Refinements on these initial steps will be attempted once the real experts return from Scandanavia.

Contents:


Enabling The NFS File System

The NFS file strategy described in the Preliminaries document has the /scratchNM partitions on all PCs accessible to all other PCs, with /scratch00 on the Master PC earmarked for user home directories, and the remaining /scratchNM partitions intended for data staging and collection. This scheme is enabled by edits (done as root) of the files /etc/exports and /etc/fstab. The edited files on machine dan01, for example, are as follows:

/etc/exports on dan01

/scratch01 192.168.8.40(rw,no_root_squash) /scratch01 192.168.8.42(rw,no_root_squash) /scratch01 192.168.8.43(rw,no_root_squash)

/etc/fstab on dan01

/dev/hda1 / ext2 defaults 1 1 /dev/hda5 /scratch01 ext2 defaults 1 2 /dev/fd0 /mnt/floppy ext2 noauto 0 0 /scratch01 /scratch01 nfs rw,user,unhide 0 0 dan00:/scratch00 /scratch00 nfs rsize=8192,wsize=8192,hard,intr 0 0 dan02:/scratch02 /scratch02 nfs rsize=8192,wsize=8192,hard,intr 0 0 dan03:/scratch03 /scratch03 nfs rsize=8192,wsize=8192,hard,intr 0 0 none /proc proc defaults 0 0

The lines in /etc/exports make the scratch partition on dan01 accessible to all other PCs within hrothgar, and the no_root_squash option lets root on one node operate as root on any of the mounted file systems. The "danNM:/scratchNM" entries in /etc/fstab explicitly mount the exported file systems from the other nodes.

The exports and fstab files on the other PCs have corresponding entries. To actually export the file system on a given PC, either reboot the machine or execute the command /usr/sbin/exportfs.

Finally, on all of the non-root machines, the command

ln -s /scratch00 /home
was used to give the system-wide user home areas a standard name.


Remote Shell Permissions

Confession: the "solution" to remote permissions described here is too simplistic. Eventually, we will adopt something closer to the pam-based procedures described in the "official" Caltech/CACR Beowulf Building Tutorial. Again, however, the initial procedures were good enough to get MPICH up and running.

The first step was to list all nodes in the /etc/hosts file on each machine. For example, the /etc/hosts file on the first machine reads:

127.0.0.1 localhost localhost.localdomain 192.168.8.40 dan00.aa.bb.cc dan00 hrothgar hrothgar.aa.bb.cc 192.168.8.41 dan01.aa.bb.cc dan01 192.168.8.42 dan02.aa.bb.cc dan02 192.168.8.43 dan03.aa.bb.cc dan03
(with aa.bb.cc replaced by the appropriate full domain extensions).

Next, a .rhosts file was set up in the home directory of all users (including root) with contents:

dan00.aa.bb.cc dan01.aa.bb.cc dan02.aa.bb.cc dan03.aa.bb.cc hrothgar.aa.bb.cc

Remarks:


User Accounts and Directories

Warning: The initial cut here was really a kludge, consisting of a series of awkward steps:

  1. A new user was added (by root) using the appropriate tool of the root control panel (under X-windows) on hrothgar.
  2. The same user was then immediately added in an "incomplete" fashion on the other nodes by executing
    /user/sbin/adduser username
    on each of the other processors.
  3. The user lines in /etc/passwd on the non-master nodes were replaced by those from the master node, and passwords were explicitly reset on those nodes.
  4. The /etc/group file from the master node was copied to the other nodes.
This gave the "installation personnel" accounts on all machines. It is by no means clear that such steps would be necessary for generic users once the system is used for real K-12 applications.


Customizing LILO

The four hrothgar PCs each have 128 Mbytes of memory, but the systems originally only "saw" 64 Mbyte. This was fixed by a simple "append" entry in the configuration file /etc/lilo.conf, which was modified to

boot=/dev/hda map=/boot/map install=/boot/boot.b prompt timeout=50 image=/boot/vmlinuz-2.0.34-0.6 label=linux root=/dev/hda1 read-only append="mem=128M"