How to Build an AppleSeed:
A Parallel Macintosh Cluster for Numerically Intensive Computing

by

Viktor K. Decyk, Dean E. Dauger, and Pieter R. Kokelaar

Department of Physics and Astronomy
University of California, Los Angeles
Los Angeles, CA 90095-1547

email: decyk, dauger, and pekok%40physics.ucla.edu

Abstract

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster.

Introduction

In recent years there has been a growing interest in clustering commodity computers to build inexpensive parallel computers. A number of projects have demonstrated that for certain classes of problems, this is a viable approach for cheap, numerically intensive computing. The most common platform for building such a parallel cluster is based on the Pentium processor running the Linux version of Unix [1]. When Apple Computer introduced the Macintosh G3 computer based on the Motorola PowerPC 750 processor, we decided to investigate whether a cluster based on the G3 was practical.

This investigation was initially motivated by the impressive single node performance we achieved on our well-benchmarked suite of plasma particle-in-cell (PIC) simulation codes [2-3] on the Macintosh G3, as shown in Table I. This was due in part to the availability of an excellent optimizing Fortran compiler for the Macintosh produced by the Absoft Corporation [4]. Not only was the performance faster then the Pentiums, but it was comparable to the performance achieved on some of the Crays.

Table I.

3D Particle Benchmarks

-----------------------------------------
The following are times for a 3D particle simulation, using 294,912 particles and a 32x16x64 mesh for 425 time steps. Push Time is the time to update one particle's position and deposit its charge, for one time step. Loop Time is the total time for running the simulation minus the initialization time.
-----------------------------------------
Computer Push Time Loop Time

SGI Origin2000/R10000: 3430 nsec. 447.8 sec.
Macintosh G3/450: 5297 nsec. 691.4 sec.
Cray Y-MP: 5650 nsec. 741.1 sec.
Macintosh G3/350: 5966 nsec. 781.2 sec.
Intel Pentium II/450: 6230 nsec. 804.6 sec.
Macintosh G3/300: 6390 nsec. 837.3 sec.
Intel Pentium II/300: 9040 nsec. 1172.8 sec.
Macintosh G3/266: 9185 nsec. 1195.3 sec.
IBM SP2, 1 proc: 10342 nsec. 1335.4 sec.
iMac/233: 10410 nsec. 1358.5 sec.
Cray T3E-900, 1 proc: 13364 nsec. 1702.0 sec.
-----------------------------------------

A further motivation to build the cluster came when we realized that the MacOS had a native message-passing applications programming interface (API), called the Program-to-Program Communications (PPC) Toolbox [5]. It has been there since 1990, and is used by AppleShare and Apple Events. We already had many programs written using the Message-Passing Interface (MPI) [6], a common message-passing API used on high-performance parallel computers. The similarity of the native PPC Toolbox message-passing facility to the low-level features of MPI further encouraged us to build the Macintosh cluster.

Our PIC codes are used in a number of High-Performance Computing Projects, such as modeling fusion reactors [7] and advanced accelerators [8]. For these projects massively parallel computers are required, such as the 512 node Cray T3E at NERSC. However, it would be very convenient if code development and student projects could be performed on more modest but user-friendly parallel machines such as the Macintosh clusters. It is also preferable that the resources of the large computers are devoted only to the large problems which require them.

Initial Software Implementation

Although a complete implementation of MPI has many high-level features (such as user defined dataypes and division of processes) not available in the PPC Toolbox, these features were generally not needed by our PIC codes. It was therefore straightforward to write a partial implementation of MPI (34 subroutines) based on the PPC Toolbox, which we call MacMPI. PPC Toolbox currently uses the AppleTalk protocol, which can run on Ethernet hardware, but does not require an IP address. Apple has announced that a TCP/IP version of PPC Toolbox will be available in the future. The entire MacMPI library was first written in Fortran77, making use of Fortran extensions for creating and dereferencing pointers available in the Absoft compiler. A C language interface to MacMPI was also later implemented, due to requests by others.

The only complicated subroutine was the initialization procedure, MPI_INIT. To initialize the cluster, a nodelist file is read which contains a list of n computer names and zones which are participating in the parallel computation. The node which has this file is designated the master node (node 0). The master initiates a peer connection with each of the other participating nodes (1 through n-1), and then passes to node 1 the list of remaining nodes (2 through n-1). Node 1 then establishes a peer connection with them, and passes on the list of remaining nodes (3 through n-1) to node 2, and so on. The last node receives a null list and does not pass it on further. Each node also establishes a connection to itself, as required by MPI. We have written a utility called Launch Den Mother to automatically create the nodelist file and copy and launch the executables on the nodes. The executable file can also be copied to each node and started manually.

Once the MacMPI library was implemented, we were able to port the parallel PIC codes from the Cray T3E and IBM SP2 to the Apple Macintosh cluster without modification. This library and related files and utilities are available at our web site: http://exodus.physics.ucla.edu/appleseed/appleseed.html. For a simple introduction to MPI programming, we recommend UsingMPI [9].

Recipe for Building a Simple Cluster

The easiest way to build a Macintosh cluster is to first a obtain a number of Macintosh G3 computers. All the current models have built-in Fast Ethernet adapters. Next obtain a Fast Ethernet Switch containing at least one port for each Macintosh and a corresponding number of Category 5 Ethernet cables with RJ-45 jacks. Plug one end of each cable to the Ethernet jack on each Mac and the other end to a port on the switch. Turn everything on. As far as hardware is concerned, that's all there is.

To setup the Macintosh for parallel processing in MacOS 8.1 and higher, one must set the AppleTalk Control Panel to use the appropriate Fast Ethernet Adapter and verify in the chooser that AppleTalk is active. Next, the computer name must be set and Program Linking should be enabled in the File Sharing Control Panel. Finally, in the Users and Groups Control Panel, one must allow Guests to link. In addition, it is strongly recommended that in the Energy Saver Control Panel the sleep time is set to Never (although it is OK to let the monitor go to sleep). A running Fortran or C program may appear to be inactive to the MacOS, and if the Mac is put to sleep, it will suddenly start to run very slowly.

To run a parallel program on the cluster, you should download 3 additional items from our web site: the Launch Den Mother utility, the AutoGuest Extension, and the Parallel Fractal Demo. Copy the AppleSeed folder containing the Launch Den Mother to the top directory of each Mac in the cluster, and drag and drop the AutoGuest Extension to the system Folder. The software installation is now complete.

To test it, we recommend you first run the Parallel Fractal Demo on a single node by double clicking. Make a note of the execution time and speed. Then run it in parallel by dragging and dropping the Parallel Fractal Demo program to the Launch Den Mother, select the computers you want to run it on, and click on "Launch Executables". By running the code in parallel, you should observe a noticeable speedup. To write your own parallel software, look at the section later in this paper entitled Using MacMPI.

AppleSeed Hardware Implementation

Our cluster of 16 Macintosh machines consists of various models purchased at different times during the last 18 months. The first 8 machines purchased were the G3/266 and next 4 were the G3/300. The most recent acquisitions were 4 G3/350 (known as "Blue and White") machines. The baseline Macintosh G3 running at 350 MHz currently (August, 1999) costs $1299 at UCLA. This machine is a tower model with 64 MB RAM, 1MB L2 Cache, a 6 GB Hard Drive and CD-ROM. We upgrade the memory of each Macintosh by adding 256 MB (current cost $338), so that the total memory of each of the latest Macintoshes is 320MB.

The Macintosh G3/350 comes with a built-in 100BaseT Ethernet adapter. In order to keep Internet traffic separate from the MPI traffic, we purchase an additional 100BaseT PCI Fast Ethernet adapter, currently from Asanté (cost $49), so that the total cost of each of the latest Macintoshes is $1686.

For a simple introduction to Macintosh networking, we recommend the MacOS 8 Bible [10]. If only two Macs are being clustered, the only additional equipment needed is a Category 5 cross-over cable. We made our own cables, which otherwise would have cost about $8 apiece. A hub or switch is required to cluster more than 2 Macintoshes. If more than 4 nodes are being networked, a switch gives better performance but costs more. We currently have a 16 port Asanté and 24 port Cisco switch (the cost of such switches is around $1500).

If purchased today (August, 1999), the total cost of a 16 node cluster is about $28,000. Such a cluster would contain over 5GB of RAM, and nearly 100 GB of disk space. This cost does not include the monitor. Generally, each node does not need its own monitor. A number of them can share a monitor either via a monitor switch box or via software such as Apple's Network Assistant or Farallon'sTimbuktu. If the machines are close together, a monitor switch box is convenient (we use a KVM switch from Black Box). If the machines are physically apart, then software exists which allows one observe and control Macintoshes remotely. We have used the Network Assistant software successfully, although it takes network bandwidth away from the application. Figure 1 shows a configuration with 8 G3/266 computers.
 

Picture of 8 Macintoshes

Figure 1: AppleSeed cluster of 8 Apple Macintosh G3/266 computers with iMac.

Our cluster has two networks running simultaneously. MacMPI currently uses AppleTalk with Fast Ethernet (100BaseT). This network has no other nodes on it, in order to maximize performance and enhance security. In addition, the Macintoshes can be connected to the Internet using the built-in Ethernet (10/100 BaseT) running TCP/IP. This gives the cluster access to the outside world and enables importing and exporting files using an ftp program. Note that we could have built this cluster from any PowerPC Macintoshes. We chose the G3 only because of its superior performance.

Performance

The performance of this cluster was excellent for certain classes of problems, mainly those where communication was small compared to the calculation and the message packet size was large. Results for the large 3D benchmark described in Ref. [3] are summarized in Table II. One can see that the Mac cluster performance was comparable to that achieved by the Cray T3E-900 and the IBM SP2/266 in this case. Indeed, the recent advances in computational performance is astonishing. A cluster of 4 Macintoshes now has more computational power and memory than a 4 processor Cray Y-MP, one of the best supercomputers of a decade ago, for less than one thousandth of the cost!

Table II.

3D Particle Benchmarks

-----------------------------------------
The following are times for a 3D particle simulation, using 7,962,624 particles and a 64x32x128 mesh for 425 time steps. Push Time is the time to update one particle's position and deposit its charge, for one time step. Loop Time is the total time for running the simulation minus the initialization time.
-----------------------------------------
Computer Push Time Loop Time

Mac G3/266 cluster, 8 proc: 1496 nsec. 5891.2 sec.
Mac G3/266 cluster, 4 proc: 3231 nsec. 11929.6 sec.
Mac G3/266 cluster, 2 proc: 7182 nsec. 25738.5 sec.
-----------------------------------------
Cray T3E-900, w/MPI, 8 proc: 1800 nsec. 6196.3 sec.
Cray T3E-900, w/MPI, 4 proc: 3844 nsec. 13233.7 sec.
-----------------------------------------
IBM SP2, w/MPL, 8 proc: 2104 nsec. 7331.1 sec.
-----------------------------------------
 

To determine what packet sizes gave good performance, we developed a ping-pong and swap benchmark (where pairs of processors exchange packets of equal size) and a bandwidth was defined to be twice the packet size divided by the time to exchange the data. Figure 2 shows a typical curve. As one can see, high bandwidth is achieved for packet sizes of around 215 (32768) words. Best bandwidth rates achieved on this test are less than 20% of the peak speed of the 100 Mbps hardware. We have found that for small systems with 4 nodes or less, hubs actually give better performance than switches.
Graph of Network Performance

Figure 2: Bandwidth (MBytes/sec) for 2 processors exchanging data simultaneously as a function of packet size, with an Asanté Fast Ethernet Switch.

For the 3D benchmark case described in Table II, the average packet size varied between 213 and 217 words, which is right in the middle of the region of good performance. Benchmarks for smaller problems such as the 2D case discussed in Ref. [3], did not scale as well, but still gave good performance.

Using MacMPI

To compile and run your own Fortran source code, two additional files are needed, the library MacMPI.f and the include file mpif.h. Creating an executable with the Absoft compiler is straightforward. If a user has a Fortran 77 program called test.f and a subroutine library called testlib.f, the following command will link with MacMPI.f and produce an executable optimized for the G3 architecture:

f77 -O -Q92 test.f testlib.f MacMPI.f

The include file mpif.h must also be present. One can also run the code with automatic double precision, as follows:

f77 -O -N113 -N2 -Q92 test.f testlib.f MacMPI.f

This option was used by our benchmark codes. It is possible to create a makefile both manually as well as via a graphical interface, although the makefiles differ from Unix style makefiles.

To run a Fortran 90 program, one should compile the Fortran 90 program and MacMPI.f separately, as follows:

f77 -O -Q92 -c MacMPI.f
f90 -O -604 -c test.f
f90 -O -604 test.f.o MacMPI.f.o

To compile and run a C source program, one needs the library MacMPI.c and the header file mpi.h. With the Absoft C compiler, one creates the executable with the following command:

acc -O -A -N92 test.c MacMPI.c

With the Absoft compiler, C and Fortran programs can easily call one another. To setup the Macintosh for parallel processing, follow the instructions described in the above section on Building a Simple Cluster.

Launch Den Mother

A parallel application can be started either manually or automatically. A utility called Launch Den Mother (and associated Launch Puppies) has been written to automate the procedure of selecting remote computers, copying the executable and associated input files, and starting the parallel application on each computer. This utility can be downloaded from our web site.

Before running Launch Den Mother, each participating Macintosh must have the Launch Puppy utility located in a folder called AppleSeed , that must reside in the top directory of the startup disk. In addition, the AutoGuest INIT Extension available from Apple Computer [11] should be installed in the Extensions folder in the System folder on each participating Macintosh. AutoGuest permits the Finder of each computer to start the Launch Puppy if guest link access has been selected in the Users and Groups Control Panel, without asking for further verification from the owner of each machine. Launch Den Mother operates best with MacOS 8.0 or later.

The Launch Den Mother utility needs to reside only on the computers which will be initiating a parallel application, although we normally install it on all the computers. After starting Launch Den Mother, a single dialog box appears, as shown in Figure 3.
 

Launch Den Mother dialog box

Figure 3. Launch Den Mother utility dialog box.

First one selects the application to run. In the upper left hand corner of the dialog box appears a list of files, from which one selects the files which will be copied to the other Macs and executed. This list of files selected is displayed in the lower left hand corner of the dialog box. In Figure 3, one can see that from the Erica folder, we have chosen two files, an executable file called lattice.out, and an input file called input.lattice, which is needed by lattice.out. Only one executable can be selected. To execute our Demo program, look for and select Parallel Fractal Demo.

Then one selects the computers to run on. In the upper right hand corner of the dialog box appears a list of available Macintoshes whose owners have permitted Program Linking in the File Sharing Control Panel. One selects from this list the computers one wishes to run on. In Figure 3, four computers from the Local Zone have been selected for execution, uclapic7, uclapic10, uclapic11, and uclapic12. Other computers were available, but two of them were already busy running a parallel job (Aries and Io). It is not required that the user's computer be one of those selected for running the parallel application.

Once the application and computers have been selected, one clicks on Launch Executables. The files are then copied to the AppleSeed folder on each computer and each application is started up. MacMPI controls any further communication between nodes. That's all there is to it.

The Launch Den Mother works by sending an Apple Event to the Finder on each remote Macintosh, requesting that the Launch Puppy be started up. The Launch Puppy must be in the AppleSeed folder so that the Finder can find it. This remote Apple Event requires MacOS 8.0 to work properly. Once all the remote Launch Puppies are started up, the Den Mother sends the requested files to each Puppy, which then copies them to the AppleSeed folder on the local Macintosh. The Puppy starts its copy of the parallel application, sends a message to the Den Mother, and the Den Mother tells the Puppy to quit. After all the Puppies have been told to quit, the Den Mother launches the application on node zero and quits, and MacMPI takes over.

If the user selected his or her own machine to participate, that machine becomes the master node (node 0 in MPI). Otherwise, the first node on the list becomes a remote master, and the user's Launch Den Mother starts up and passes control to a remote Launch Den Mother.

During execution, errors detected by MacMPI.f are written to Fortran unit 2, which defaults to a file called FOR002.DAT. In MacMPI.c, the errors are written to a file called MPIerrs. This file should be examined if problems occur. Some errors may be due to the fact that our implementation of MPI is only partial. There is one error log entry generated which is caused by a bug in PPC Toolbox. This error entry says that an Incomplete Read occurred, but the expected and actual data received are the same. The MacMPI library has a work around for this bug, so this error entry is for informational purposes only. Most MPI errors are fatal and will cause the program to halt.

After execution, there are usually output files created by the application. In most of our applications, only the master node produces any output. All the other nodes which have output data send it to the master node using MPI. Since the master node is usually either on the user's desk or in a common area available to everyone, there is no difficulty accessing these files. However, it is possible that there may be output files in the AppleSeed folder on a remote computer. The simple way to retrieve remote output files is the use the "Get Files " feature of Launch Den Mother. It is also possible, but more complex, to access them using AppleShare (via the Chooser), if the owner of the remote computer allows it. To allow access, the owner first needs to turn File Sharing on in the File Sharing Control Panel and allow guests to connect to the computer in the Users & Group Control Panel. If the owner wants to allow read only access to the AppleSeed folder while disallowing all other accesses to all other files, one first selects the AppleSeed folder and chooses Sharing in the File Menu, then selects Share this item and allows Read Only access to Everyone.

The Launch Den Mother has additional features not described here, such as the ability to kill remote jobs. Further details can be found in the README documentation available with the distribution on our web site.

Manual Execution

Parallel applications can also be started manually. This is necessary if the Macintoshes are running a system earlier than MacOS 8.0.

MacMPI requires that the master node have a file called nodelist present in the same directory as the executable. If the parallel job is started manually, this file must be created by the user. (The Launch Den Mother utility creates this file automatically.) This is a straight text file. The first line contains a port name. If the name ppc_link is used, then the slave nodes do not need to have a host file. (If some other port name is used, then the slave nodes need to have a nodelist file which contains only the port name.) The second line contains the name self. This name is required only if the cluster contains a single processor. Finally the remaining lines consist of a list of computer names and zones, one name per line, in the form:

computer_name@zone_name

If there is only one zone in the AppleTalk network, the zone names are omitted. The names cannot have trailing blanks. A sample nodelist file is shown in Table III.

Table III.

Sample nodelist file

ppc_link
self
uclapic1
BarbH2@Physics-Plasma Theory
fried@Physics-Plasma Theory

To start the parallel job manually, one has to copy the executable to each node (via floppy disk, AppleShare or ftp), and start up (by double clicking) each executable. The master must be started last.

Later Software Implementation

In order to make the Macintosh cluster more useful for parallel programming, we have added a visualization monitor to MacMPI. This small window shows which nodes are communicating, whether they are sending, receiving or both, as well as a histogram of the size of packets being sent.

We have also made available a number of working applications. The most interesting of these is the Parallel Fractal Demo (including an Interactive version) which runs on an arbitrary number of nodes. This demo is useful for beginners to make sure their network connections are all properly working, as well as for illustrating the speedups one obtains with parallel processing.

We have found that MacMPI based on PPC Toolbox is very reliable, since PPC Toolbox is very mature. However, it does not give optimum performance, since it was written in an earlier era when network speeds were much slower. The current performance is adequate for many of the large problems we do, but it limits the range and types of problems that can be run on the cluster. PPC Toolbox is based on the AppleTalk protocol only. Apple subsequently developed a communications library with higher performance, called Open Transport, which can be used either with AppleTalk or TCP/IP.

In order to obtain better performance, we decided to implement a new version of MPI based on Open Transport, called MacMPI_OT. Our initial implementation was still based on AppleTalk, because it is very user-friendly for the non-expert. A preliminary version of MacMPI_OT gives performance which is much improved (almost 5 times faster for large packets). However, we have found Open Transport is difficult to work with, and MacMPI_OT is still not completed. A beta version in Fortran77 and C is available on our web site for testing, and it works correctly with the Launch Den Mother.

There are a number of future directions one can take with MacMPI. Apple has announced that it is moving away from AppleTalk in favor of TCP/IP, and they have promised to make TCP/IP as easy to use as AppleTalk is now. It would be fairly straightforward to convert the our AppleTalk version of MacMPI_OT to TCP/IP, since Open Transport is deliberately protocol independent, although this would require modifying the Launch Den Mother utility. But there are other possibilities. Apple has announced a TCP/IP version of PPC Toolbox which also may give much higher performance. Furthermore, with the advent of MacOS X, existing Unix-based implementations of MPI could probably also be easily ported to the Macintosh. Finally, MPI Software Technology [12] has announced support for Apple Orchards, with a fully compliant and optimized version of MPI. It is uncertain which path we will follow, but at least there are a number of promising possibilities.

Evaluation

The inexpensive, powerful cluster of Macintosh G3s has become a valuable addition to our research group. It is especially useful for student training and running large calculations for extended periods. We have run simulations on 4 nodes for 100 hours at a time, which use 1 GByte of memory. This is especially useful for unfunded research or exploratory projects, or when meeting short deadlines. The turn around time for such jobs is often shorter than on supercomputer centers with more powerful computers, because we do not have to share this resource with the entire country. (Although some problems can only run on the supercomputer centers because they are too large for the Macintosh cluster.)

The presence of the cluster has encouraged students and visitors to learn how to write portable, parallel MPI programs, which they can run later on larger computers elsewhere. In fact, since Fast Ethernet is slow compared to the networks used by large parallel supercomputers, our students are encouraged to develop better, more efficient algorithms that use less communication. Later, when they move the code to a larger parallel computer, the code scales very well with large numbers of processors. The cluster also encourages a more interactive style of parallel programming, in contrast to the more batch-oriented processing encouraged by traditional supercomputer centers.

Our current configuration has 4 machines located in one room, with a single monitor and monitor switch. These 4 Macs are a common resource and are available all the time for computing. The other 12 are located in various offices and are available when the owner of the machines decides to allow it, typically nights and weekends. The 4 nodes which are located together are ideal for debugging code and for long, extended calculations. The other 12 machines are useful for shorter, overnight calculations. It is also possible to use all 16 nodes in a single calculation, but in practice it is difficult because invariably some student or other is working late at night on his own machine. We plan to expand the cluster by adding 4 machine sub-clusters in various student and researcher offices and thus relieve pressure on the 4 common machines.

Because the cluster is used only by a small research group, we do not need sophisticated job management or scheduling tools (which may not even exist). Everything is done in the spirit of cooperation, and so far that has worked. (We don't have uncooperative people in our group, anyway!)

Why are we using the MacOS? Why not run Linux (a free Unix) on the Macs, for example? One reason is that we have always been Macintosh users and are very productive in the MacOS environment. There are good third party mathematical or numerical software packages, such as Mathematica, which run better on the Macintosh G3 than on our Unix workstations. Another reason is that many of the Macs are used for purposes other than numerical calculations and rely on software written for MacOS. Furthermore, we find that the Mac environment makes it very easy to couple the output of our numerical codes to other software written in the MacOS, such as Fortner's graphics packages or Mathematica or QuickTime, or to programs we use for presentation, such as ClarisWorks or Microsoft Word. Finally, the MacOS has encouraged us to write software to a higher standard, that has more of a Mac "look and feel" (such as the Launch Den Mother).

Linux, in comparison, is far more difficult for the novice to use than the Mac. Substantial Unix expertise is required to correctly install, maintain, and run a Unix cluster. Indeed, reference [1] discusses many of the details one needs to worry about.
In contrast, with the Mac cluster, the only required non-standard item is a single library, MacMPI, and a single utility, Launch Den Mother. Everything else is right out of the box, just plug it in and connect it.

Because of its ease of use, the Macintosh cluster is particular attractive to small groups with limited resources. For example, high school students are learning how to run parallel applications on clusters they built themselves [13].

What are the problem areas? The most common failure has nothing to do with the Macintosh computers: it is the network. Symptoms of this failure are nodes appearing and disappearing in the list of available nodes in the Launch Den mother utility, or errors returned by MacMPI indicating a node is unreachable. The most common cause of this failure are mismatches in the duplex settings. Fast Ethernet can send either full duplex, where a node can simultaneously send and receive, or half duplex, where it cannot. The Ethernet adapters on each computer should have the same settings as the switch or hub. Hubs can only send half duplex, so the adapters should be set accordingly. Switches and adapters are often designed to auto-negotiate the duplex setting, but we have seen evidence that this auto-negotiation can fail and produce a mismatch. Switches and adapters can often be set to force a duplex setting, but we have seen cases where this also leads to mismatches. The only way to be sure is to check the actual settings. Generally switches have lights on their front panels indicating whether full or half duplex is being used. More sophisticated switches also have software which will tell you the settings of each port, and if there are a high number of errors. Ethernet adapters sometimes also have lights and software that can tell you the settings. The built-in Ethernet adapters from Apple have neither, but they are preset to half duplex, and there is an extension available from Apple to change this setting. If problems are occurring, one should first check these settings. If they are mismatched, it is sometimes necessary to reset the switch and/or the Macintosh to fix it. On less sophisticated switches, the only way to reset them is to pull the plug. Similar networking problems occur on non-Macintosh computers. They are typically more noticeable with tightly coupled clusters because they are taxing the network more severely.

Another problem area is the generation of remote Fortran or C run time errors by running applications, especially if they happen on computers which are located behind closed doors. Some run time errors gracefully abort, but they require user interaction to terminate (e.g, "Click OK" to continue), which may not be possible on a remote machine. Others are less graceful and hang the machine. The Apple Network Assistant software allows one to click OK or reboot a remote Macintosh, if it is not "too dead." However, Network Assistant can be very invasive of the computer owner's privacy, and some people do not wish to have it installed on their machine. We do not have any good solutions for these problems, except to encourage students to debug their codes on the 4 nodes which are located together and only run on remote machines after the code has been tested. Nevertheless, unexpected errors still happen, especially with student written software.

The future continues to look bright. The recently announced Macintosh G4 has a main processor which is more optimized for floating point calculations, and a vector co-processor (designed for multimedia) that can calculate at a rate of 3.2 GFlops, single precision only (1 GigaFlop=1000 MegaFlops). Networking is also becoming cheaper and faster, with Gigabit networks and FireWire two interesting candidates to improve performance. Finally, with MacOS X, a large body of Unix-based software should also be available.

Acknowledgements

We wish to acknowledge the useful advice given to us in the early days of AppleSeed by Myron Krawczuk, Cliff McCollum, Johan Berglund, Chris Thomas, Pete Nielsen, and Paul Hoffman. Subsequent help has been given by Tim Kelly, Dirk Froehling, Louis Lerman, and Tim Parker. This work has supported by NSF contracts DMS-9722121 and PHY 93-19198 and DOE contracts DE-FG03-98DP00211, DE-FG03-97ER25344, DE-FG03-86ER53225, and DE-FG03-92ER40727.

References

[1] T. L. Sterling, J. Salmon, D. J. Becker, and D. F. Savarese, How to Build a Beowulf, [MIT Press, Cambridge, MA, USA, 1999].

[2] V. K. Decyk, "Benchmark Timings with Particle Plasma Simulation Codes," Supercomputer 27, vol V-5, p. 33 (1988).

[3] V. K. Decyk, "Skeleton PIC Codes for Parallel Computers," Computer Physics Communications 87 , 87 (1995).

[4] See http://www.absoft.com/

[5] Apple Computer, Inside Macintosh: Interapplication Communication [Addison-Wesley, Reading, MA, 1993], chapter 11.

[6] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Reference [MIT Press, Cambridge, MA, 1996].

[7] R. D. Sydora, V. K. Decyk, and J. M. Dawson, "Fluctuation-induced heat transport results from a large global 3D toroidal particle simulation model", Plasma Phys. Control. Fusion 38 , A281 (1996).

[8] K.-C. Tzeng, W. B. Mori, and T. Katsouleas, "Electron Beam Characteristics from Laser-Driven Wave Breaking," Phys. Rev. Lett. 79, 5258 (1997).

[9] William Gropp, Ewing Lush, and Anthony Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface [MIT Press, Cambridge, MA, 1994].

[10] Lon Poole, MacWorld Mac OS 8 Bible [IDG Books Worldwide, Foster City, CA, 1997], chapter 17.

[11] See ftp://ftp.apple.com/developer/Tool_Chest/OS_Utilities/

[12] See http://www.mpi-softtech.com/

[13] Dennis Taylor, "Apples are the Core of These Clusters," IEEE Concurrency, vol. 7, no. 2, April-June, 1999, p. 7.