Frequently Asked Questions about AppleSeed

Almost the only troubleshooting question we get is: 
I followed your instructions on my n-node system (where n>4), but something went wrong as it started up. What do I do?
The remaining burning questions regarding AppleSeed are:
Are you getting support from Apple?
Are you getting support from anyone else?
Has anyone else built a similar cluster?
How do I build one?
Can I build a Mac cluster running OS X?
Will the cluster run my application faster?
Why bother with the monitor, or the Mac OS, or a keyboard, etc.?  Why not rack-mounted?
Why the Mac hardware?
Will this work with iMacs?
How well does the G4 improve your results?
Can I use more than four Macs? Does MacMPI "scale"?
Can I use older Power Macs?
What is, and why are you using, the MPI programming interface?
When major higher performance computers already exist, what purpose does such a cluster serve?
How much of MPI have you implemented?
Why not Linux?
Did you pare down the Mac OS to its bare bones or make any special modifications?
Doesn't the MacOS get in the way?
How about TCP/IP?
Why are you using Open Transport?
Will MacMPI run on Mac OS X? Will it run better or worse?
Why not use all the computers on the Internet? Why not use distributed computing solutions like Sun's Jini, or distributed.net, or SETI@Home?
What is your cluster's current hardware configuration?
What are the most bizarre kind of questions you get?

AppleSeed seems to have stirred quite a discussion, which we appreciate.  It's nice to see the word get out there.

Although people often answer their own questions when they build their own clusters (please email us if you do!), we've received many questions from people, so here are some answers in a group for everyone to see:
 
 

Almost the only troubleshooting question we get is:

We suggest testing your system two computers at a time first. You should be able to isolate the problem to one computer. Typically, that one computer isn't configured correctly, has a faulty network port, or has a faulty cable.
 
 

The remaining burning questions regarding AppleSeed are:

Apple is showing some interest.  We've talked to a few Apple representatives, who spoke positively about our project. So far they've lent us for benchmarking: one G3/333, one G3/400, and a pair of beta-unit dual-processor G4's. They've given us publicity in their Apple University Arts newsletter, and they've twice given us passes to their World Wide Developer's Conference. We are beginning to have a regular dialogue going, but even more would be helpful.   (If we can get the attention of Steve Jobs, et al, that would be great!)
 
  Starnine has been kind enough to provide a copy of their WebSTAR web server application.  We appreciate their generosity and are using it right now on an iMac to serve you this web site.

Asanté has generously donated an eight port 100BaseT switch for our project.

Louis M. Lerman has been very generous to personally donate a sixteen port 100BaseT switch for our project.

Otherwise, to buy the computers and other equipment, the research group pooled a little here and a little there from a variety of preexisting grants provided by the Department of Energy and the National Science Foundation for scientific research.
 
 

Yes.  A list of other groups, of whom we are aware, who have built their own clusters based on our work is available here.  In addition, descriptions from high school and junior high school students are available here. Another list of users of Macintosh clusters is available at the Pooch Users page.
  Just follow the instructions provided in the one-page AppleSeed Recipe. Complete details on what we did to make it work is in the AppleSeed Report. References and source code can be found at the AppleSeed Development Page and in the Pooch Software Development Kit. Additional information can be found at the Pooch Documentation Page.
  Yes. We have Carbon versions of our software available and fully operational for OS 9 with CarbonLib and the latest version of OS X. We have even successfully run parallel jobs on a mixed Mac clusters running OS 9 and OS X. (Can you imagine attempting that with Linux?)

Regarding X, we highly recommend using version 10.2.1 or later of OS X. Previous versions of OS X had significant bugs. Today, Apple has fixed all but the smallest bugs, so you can use our software on OS 9 with CarbonLib and you can run with the latest OS X.
 

That software needs to be rewritten to make advantage of multiple Macs and interprocessor communications. However, there are standard ways to communicate, such as the MPI library, so that you can send messages the same way on every parallel computer.

There are already quite a lot of programs written using MPI in the scientific community, which run on the IBM SPs, the Cray T3Es, Linux clusters, Origin 2000s, etc. We have written an MPI library, MacMPI, so that such programs could also run on Macs.

The difficulty of writing a parallel program depends very much on the problem. If it is just a question of a master assigning pieces of a large task to individual processes which do not have to communicate until they are finished, it is probably fairly straightforward. If you know how to program, you might look at a simple program such as knock.f or knock.c on our web site.

If you have the source code, you can begin parallelizing it, but since your ABC application was written by XYZ, the programmers at XYZ would have to do it. We suggest that you email that company and suggest to them that they write their code for parallel Mac clusters. You are welcome to point out our AppleSeed web site to the company for inspiration and a demonstration of what's possible.
 

For versatility.  We are making use of the full flexibility that is available to Macs: for both individual and cluster use, right off the bat. The Macintosh OS and hardware give us this kind of versatility and flexibility, right out of the box.

This just in: Today, Apple's XServe is shipping to customers, you may certainly mix and match rack-mounted XServes with your desktop Macs. The possibilities are endless.
 

The combination of the PowerPC G3 processor and the Absoft Fortran compiler is the key that give such incredible results with our physics codes, sometimes beating a 450 MHz DEC Alpha, rated at 900 MFlops peak. Use of the G4 is even better. It's very rare to find a processor/compiler combination that gives such surprisingly good results (e.g., a high percentage of "peak" speed).  From what we can tell, the PowerPC line is among the most balanced CPU architectures, unlike Alpha and Pentium.  We mean balanced in the sense of that the CPU's I/O and processing units are both fast enough and the cache is substantial enough to keep up with each other, for example.

In addition, setup and installation is trivial: After connecting your Macs to a switch or hub, toggle a few switches (as low as two, depending on your setup) in the Mac OS, compile your code with MacMPI, and run. And maintenance is the same as for any other Mac network, that is, virtually nothing (no frayed cables, etc.). It's truly "plug and play."
 

Yes, absolutely, positively.  Just buy a 100BaseT hub or switch and hook up the iMacs.  Instant, very pretty, supercomputer.   (Just set them up and run your parallel code.)  The Statistical Mathematics department at UCLA has done just that: An iMac student lab by day, a supercomputer by night.
 
  We currently have Power Mac G4s forming the dedicated portion of our cluster. The Velocity Engine (formerly known as AltiVec) is a vector processor. Members of our group have experience efficiently using Cray vector processors since the early 1980s, so we believe we can well utilize the G4 using our codes and techniques. Fractal Demos utilizing AltiVec are available.
 
  Yes. MacMPI itself operates independently of the number of nodes. Just add Macs. As for any other parallel platform, your code must be able to organize your problem into that many pieces.

A parallel system "scales" when you add twice as many components and it still works almost twice as fast. The longer that pattern continues, the "better" it "scales". How well your code will "scale" depends much more on your problem and your network than on MacMPI itself.

For an demonstration of the potential scalability of a Mac cluster, please consider this article about 76 dual-processor G4s at USC achieving 1/5 TeraFlop. We believe our code and approach scales very well.

If your problem requires a great deal of interprocessor communication, then it will "scale better" on a Cray than on a cluster. The network will become the bottleneck at a greater number of nodes in the Cray. (After all, a faster network is a lot of what you're paying for in a Cray.)

However, if your problem requires almost no interprocessor communcation (for example, cranking through Monte Carlo calculations or completely independent pixels) then the cluster will "scale" very well, because almost all the time is spent computing and the network almost never becomes a bottleneck.

Now, a good parallel programmer would always try to keep the use of the network to a minimum, because communication will always be a bottleneck at some level. So if you train yourself on a cluster, with its comparatively limited network, then your code will run that much better when you get the chance to run it on a Cray. And then, because administrators of such large computers like to see their hardware being used efficiently, you'll be even more likely to get time on that Cray again in the future because your code ran so well...
 
 

Yes. You can, for example, use an existing 10BaseT network of older Macintoshes based on the PowerPC. Obviously, it will run a little slower.
 
  Message-Passing Interface (MPI) is a standard programming interface for sending messages between processors.  MPI is on most parallel machines on the planet, like the Cray T3 series, IBM SP series, Intel Paragon, SGI Origin, Fujitsu, etc.  The idea is that anybody can run the same source code that ran on those parallel machines on a Mac cluster, with no modification.  (We didn't change our physics codes.)
 
  A Mac cluster is not going to replace the Crays, but complement them.  Our focus is those people who already have parallel code and people who want to start writing parallel code.  The cluster is appropriate for the jobs too big to run on one processor and too small to require getting time on a big supercomputer.  Here are a few examples:
  1. Suppose you needed to do a run to prove a method for a grant proposal, but you don't have the time to wait or money to pay for CPU time on a Cray?  It's a catch-22: you need the grant to get the time to use the Cray.  Answer: Use the Mac cluster instead.
  2. If you have a dedicated cluster, it could give you a better turnaround time because the Crays are busy or your permitted run time is limited.
  3. Running many different small jobs on a large parallel computer is not a good use of resources. This type of cluster can serve to prevent those smaller jobs from bogging down the big hardware.
  4. Managing a Mac cluster is much easier.
Our cluster has the computing power of the best supercomputers of eight years ago, so any parallelized problems that pushed the envelope then can be done today very simply and cheaply.
  Currently 45 of the most commonly used MPI routines.  Many of the remaining MPI calls can be built from our MacMPI library, but since our codes do not use them, we haven't had a reason to implement them.
  Linux is a fine OS, and the motiviations that lead to its creation are noble, but it does have a few drawbacks for our purposes:
  1. With the MacOS, we can run mainstream apps during the day, and simply start a job for the night.  Rebooting into another OS for computations is completely unnecessary.  Linux does not offer us that kind of flexilibilty.
  2. We are aware of experience from Beowulf clusters. Some of those clusters have been disabled or unusable due to compiler incompatibilities, hardware incompatibilities, MPI incompatibilities, and security problems. Even other computers not on the cluster in the same department were hacked into through one Beowulf cluster, so they had to shut the cluster down. All the while, those clusters have been unproductive. (We are not naming names so we don't embarrass the innocent.) (See our science page for our results.)
  3. We are not Linux experts.  We have other things to do besides learning the subtleties of Linux, and we don't have the money to pay a system administrator to be a Linux expert.  (Please factor that into your price/performance comparisions.)  For example, with the Beowulf project, there is a specific procedure to clone the OS and all the drivers from one box to another.  A Macintosh cluster is just as easy to set up and administer as any other Mac network.  That is, we don't have to administer anything at all, nada, nothing, zip.  Just plug it in and go.

  Nope.  We're running standard Mac OS 9 while evolving towards OS X.  MacMPI is a library that you compile with your code that calls routines built into the Mac OS.
  No, actually, on OS 9, it's easy to get it out of our way.  Because the classic Mac OS doesn't do "true multitasking", we can almost entirely take over the CPU.  We are aware that the real time jobs can take on certain "true multitasking" OS's can be randomly greater than the CPU time by factors of two to four. And how many parallel computer platforms can you think of that can start a job using "drag-and-drop"?  Using Pooch with MacMPI, we can do that, and that's the kind of user friendliness that we are trying to keep.

Also, these days, OS X's underpinnings is BSD Unix, complete with preemptive multitasking and protected memory, so the question is moot if you are using OS X.
 

MacMPI_X is a Carbon library that uses Apple's Open Transport implementation of TCP/IP for communications. All of our main programs now use that library or its predecessor, MacMPI_IP. Both are now available.

The original MacMPI calls the Program-to-Program Communications Toolbox (PPC Toolbox), which has been in the Mac OS since 1990. The PPC Toolbox has currently runs over the AppleTalk networking protocol.
 

MacMPI using Open Transport over AppleTalk (see our other page), and it already achieves, for long messages, a large fraction (significantly greater than the PPC Toolbox version) of the theoretical peak speed of 100BaseT. The TCP/IP version achieves even greater performance.
  OS X does not support the PPC Toolbox, it will be impossible to use the original MacMPI on Mac OS X.

Performance of MacMPI_X on OS X looks good so far, but OS X 10.1 does seem to run into a few hiccups at specific message sizes. We have not yet seen that the later OS X's has all these problems fixed, but we and employees of Apple are investigating the matter.
 

Distributed computing is a system where many computers connected to a sparse network, typically the Internet, are each given a specific piece of a problem.  It works very well for problems that can be cut up into completely independent pieces.  For example, in the RC5 project, where the task is to break an encrypted message by trying one key at a time, testing one key does not depend on the result of any other key, or for 3D rendering, computing one pixel can be computed independently of any other.

Parallel computing platforms are for large problems where there are significant interdependencies.  In our physics codes, we have anywhere from hundreds to millions of charged particles interacting through electromagnetic fields, and what they do depends on the information about all the others.  Consequently, each processor requires information from all the other processors before proceeding to the next time step.  CPU time between time steps typically range from a fraction of second to 20 seconds or so, depending on the size of the problem.

The information to be delivered can be on the order of megabytes, making these problems impractical to implement over modem connections.  Any delays or unreliability in getting packets across the Internet would halt the entire process.  So that's why we need some kind of reliable, responsive, local, fast networking, like 10BaseT at the very minimum.  We're using 100BaseT, and we're seeing if there's a way to use FireWire.
 

The cluster consists of eight beige G3/266's, four beige G3/300's, four "blue & white" G3/350's, eight G4/450's, two dual-processor G4/450's, one PowerBook G3/333, and one "All-in-One" G3/266. Each desktop computer uses an Asanté 100Mbps Fast Ethernet PCI card to connect to a Cisco Catalyst 2900 Series XL. The two dual-processor Macs and four G4/450's are in one office, four G4's are in another, while the remaining computers are distributed on desks (on three different floors) for members of our group.
  The oddest questions we get are from those that seem like they don't believe we've done what we've done. They either don't think our results are relevant or say that we are being deceptive. Questions like, "How can you possibly compare FLOPs from a Mac to FLOPs from a Cray? Supercomputer FLOPs are fundamentally different..." or "Your G3 results couldn't possibly be better than an IBM SP. That simply can't be the truth..." (Even though the G3 was based on the POWER architecture used in the IBM SP.)

I don't know how to answer questions like that. All I can suggest is, try it yourself! Fortunately, the vast majority of questions and comments are much more supportive, intelligent, and forward-thinking. Keep 'em coming!
 

Last updated: October 9, 2002

Back to AppleSeed

http://exodus.physics.ucla.edu/appleseed/appleseedfaq.html