AppleSeed seems to have stirred quite a discussion, which
we appreciate. It's nice to see the word get out there.
Although people often answer their own questions when
they build their own clusters (please email
us if you do!), we've received many questions from people, so here
are some answers in a group for everyone to see:
Almost the only troubleshooting question we get is:
-
I followed your instructions on my n-node system (where
n>4), but something went wrong as it started up. What do I do?
We suggest testing your system two computers at a time first.
You should be able to isolate the problem to one computer. Typically, that
one computer isn't configured correctly, has a faulty network port, or
has a faulty cable.
The remaining burning questions regarding AppleSeed are:
-
Are you getting support from Apple?
Apple is showing some interest. We've talked to a few
Apple representatives, who spoke positively about our project. So far they've
lent us for benchmarking: one G3/333, one G3/400, and a pair of beta-unit
dual-processor G4's. They've given us publicity in their Apple
University Arts newsletter, and they've twice given us passes to their
World
Wide Developer's Conference. We are beginning to have a regular dialogue
going, but even more would be helpful. (If we can get the attention
of Steve Jobs, et al, that would be great!)
-
Are you getting support from
anyone else?
Starnine has been
kind enough to provide a copy of their WebSTAR web server application.
We appreciate their generosity and are using it right now on an iMac to
serve you this web site.
Asanté has
generously donated an eight port 100BaseT switch for our project.
Louis M. Lerman has been very generous to personally donate
a sixteen port 100BaseT switch for our project.
Otherwise, to buy the computers and other equipment, the
research group pooled a little here and a little there from a variety of
preexisting grants provided by the Department
of Energy and the National Science Foundation
for scientific research.
-
Has anyone else built a
similar cluster?
Yes. A list of other groups, of whom we are aware,
who have built their own clusters based on our work is available here.
In addition, descriptions from high school and junior high school students
are available here.
Another list of users of Macintosh clusters is available at
the Pooch Users page.
Just follow the instructions provided in the one-page AppleSeed
Recipe. Complete details on what we did to make it work is in the AppleSeed
Report.
References and source code can be found at
the AppleSeed Development Page
and in
the Pooch Software Development Kit.
Additional information can be found at
the Pooch Documentation Page.
-
Can I build a Mac cluster
running OS X?
Yes. We have Carbon versions of our software available and
fully operational for OS 9 with CarbonLib
and the latest version of OS X. We have even successfully run parallel jobs
on a mixed Mac clusters running OS 9 and OS X. (Can you imagine attempting that
with Linux?)
Regarding X, we highly recommend using version 10.2.1 or later
of OS X. Previous versions of OS X had significant bugs. Today, Apple has fixed
all but the smallest bugs, so you can use our software on OS 9 with
CarbonLib
and you can run with the latest OS X.
-
Will the cluster run my copy
of ABC application by XYZ corporation faster?
That software needs to be rewritten to make advantage of
multiple Macs and interprocessor communications. However, there are standard
ways to communicate, such as the MPI library, so that you can send messages
the same way on every parallel computer.
There are already quite a lot of programs written using
MPI in the scientific community, which run on the IBM SPs, the Cray T3Es,
Linux clusters, Origin 2000s, etc. We have written an MPI library, MacMPI,
so that such programs could also run on Macs.
The difficulty of writing a parallel program depends very
much on the problem. If it is just a question of a master assigning pieces
of a large task to individual processes which do not have to communicate
until they are finished, it is probably fairly straightforward. If you
know how to program, you might look at a simple program such as knock.f
or knock.c on our web site.
If you have the source code, you can begin parallelizing
it, but since your ABC application was written by XYZ, the programmers
at XYZ would have to do it. We suggest that you email that company and
suggest to them that they write their code for parallel Mac clusters. You
are welcome to point out our
AppleSeed web site to the company for inspiration and a demonstration
of what's possible.
-
Why bother with the monitor,
or the Mac OS, or a keyboard, etc.? Why not rack-mounted?
For versatility. We are making use of the full flexibility
that is available to Macs: for both individual and cluster
use, right off the bat.
-
By implementing a combination of dedicated Power Macs and
Macs on people's desks, we can use some Macs individually while the other
Macs are running parallel jobs, or, for bigger problems, we can add the
Macs on the desks at night or over the weekend.
-
In addition, we can phase in new Macs for computation and
redistribute older dedicated Macs for individual use. Over time, reusing
the older hardware becomes very cost-efficient.
The Macintosh OS and hardware give us this kind of versatility
and flexibility, right out of the box.
This just in:
Today,
Apple's XServe
is shipping to customers, you may certainly mix and match rack-mounted
XServes with your desktop Macs. The possibilities are endless.
The combination of the PowerPC G3 processor and the Absoft
Fortran compiler is the key that give such incredible results with our
physics codes, sometimes beating a 450 MHz DEC Alpha, rated at 900 MFlops
peak. Use of the G4 is even better. It's very rare to find a processor/compiler
combination that gives such surprisingly good results (e.g., a high percentage
of "peak" speed). From what we can tell, the PowerPC line is among
the most balanced CPU architectures, unlike Alpha and Pentium. We
mean balanced in the sense of that the CPU's I/O and processing units are
both fast enough and the cache is substantial enough to keep up with each
other, for example.
In addition, setup and installation is trivial: After
connecting your Macs to a switch or hub, toggle a few switches (as low
as two, depending on your setup) in the Mac OS, compile your code with
MacMPI, and run. And maintenance is the same as for any other Mac network,
that is, virtually nothing (no frayed cables, etc.). It's truly "plug and
play."
-
Will this work with iMacs?
Yes, absolutely, positively. Just buy a 100BaseT hub
or switch and hook up the iMacs. Instant, very pretty, supercomputer.
(Just set them up and run your parallel code.) The Statistical Mathematics
department at UCLA has done just that: An iMac student lab by day, a supercomputer
by night.
-
How well does the G4 will improve your
results?
We currently have Power
Mac G4s forming the dedicated portion of our cluster. The Velocity
Engine (formerly known as AltiVec) is a vector processor. Members of our
group have experience efficiently using Cray vector processors since the
early 1980s, so we believe we can well utilize the G4 using our codes and
techniques. Fractal Demos utilizing
AltiVec are available.
-
Can I use more than four Macs?
Does MacMPI "scale"?
Yes. MacMPI itself operates independently of the number of
nodes. Just add Macs. As for any other parallel platform, your code must
be able to organize your problem into that many pieces.
A parallel system "scales" when you add twice as many
components and it still works almost twice as fast. The longer that pattern
continues, the "better" it "scales". How well your code will "scale" depends
much more on your problem and your network than on MacMPI itself.
For an demonstration of the potential scalability of a Mac cluster, please
consider this
article about
76 dual-processor G4s
at USC achieving 1/5 TeraFlop. We believe our code and approach scales very well.
If your problem requires a great deal of interprocessor
communication, then it will "scale better" on a Cray than on a cluster.
The network will become the bottleneck at a greater number of nodes in
the Cray. (After all, a faster network is a lot of what you're paying for
in a Cray.)
However, if your problem requires almost no interprocessor
communcation (for example, cranking through Monte Carlo calculations or
completely independent pixels) then the cluster will "scale" very well,
because almost all the time is spent computing and the network almost never
becomes a bottleneck.
Now, a good parallel programmer would always try to keep
the use of the network to a minimum, because communication will always
be a bottleneck at some level. So if you train yourself on a cluster, with
its comparatively limited network, then your code will run that much better
when you get the chance to run it on a Cray. And then, because administrators
of such large computers like to see their hardware being used efficiently,
you'll be even more likely to get time on that Cray again in the future
because your code ran so well...
-
Can I use older Power Macs?
Yes. You can, for example, use an existing 10BaseT network
of older Macintoshes based on the PowerPC. Obviously, it will run a little
slower.
-
What is, and why are you using,
the MPI programming interface?
Message-Passing Interface (MPI) is a standard programming
interface for sending messages between processors. MPI is on most
parallel machines on the planet, like the Cray T3 series, IBM SP series,
Intel Paragon, SGI Origin, Fujitsu, etc. The idea is that anybody
can run the same source code that ran on those parallel machines
on a Mac cluster, with no modification. (We didn't change our physics
codes.)
-
When major higher performance
computers already exist, what purpose does such a cluster serve?
A Mac cluster is not going to replace the Crays, but complement
them. Our focus is those people who already have parallel code and
people who want to start writing parallel code. The cluster is appropriate
for the jobs too big to run on one processor and too small to require getting
time on a big supercomputer. Here are a few examples:
-
Suppose you needed to do a run to prove a method for a grant
proposal, but you don't have the time to wait or money to pay for CPU time
on a Cray? It's a catch-22: you need the grant to get the time to
use the Cray. Answer: Use the Mac cluster instead.
-
If you have a dedicated cluster, it could give you a better
turnaround time because the Crays are busy or your permitted run time is
limited.
-
Running many different small jobs on a large parallel computer
is not a good use of resources. This type of cluster can serve to prevent
those smaller jobs from bogging down the big hardware.
-
Managing a Mac cluster is much easier.
Our cluster has the computing power of the best supercomputers
of eight years ago, so any parallelized problems that pushed the envelope
then can be done today very simply and cheaply.
-
How much of MPI have you implemented?
Currently 45 of the most commonly used MPI routines.
Many of the remaining MPI calls can be built from our MacMPI library, but
since our codes do not use them, we haven't had a reason to implement
them.
Linux is a fine OS, and the motiviations that lead to its
creation are noble, but it does have a few drawbacks for our purposes:
-
With the MacOS, we can run mainstream apps during the day,
and simply start a job for the night. Rebooting into another OS for
computations is completely unnecessary. Linux does not offer us that
kind of flexilibilty.
-
We are aware of experience from Beowulf clusters. Some of
those clusters have been disabled or unusable due to compiler incompatibilities,
hardware incompatibilities, MPI incompatibilities, and security problems.
Even other computers not on the cluster in the same department were hacked
into through one Beowulf cluster, so they had to shut the cluster down.
All the while, those clusters have been unproductive. (We are not naming
names so we don't embarrass the innocent.) (See our
science page for our results.)
-
We are not Linux experts. We have other things to do
besides learning the subtleties of Linux, and we don't have the money to
pay a system administrator to be a Linux expert. (Please factor that
into your price/performance comparisions.) For example, with the
Beowulf project, there is a specific procedure to clone the OS and all
the drivers from one box to another. A Macintosh cluster is just
as easy to set up and administer as any other Mac network. That is,
we don't have to administer anything at all, nada, nothing, zip.
Just plug it in and go.
-
Did you pare down the Mac
OS to its bare bones or make any special modifications?
Nope. We're running standard Mac OS 9 while evolving towards
OS X. MacMPI
is a library that you compile with your code that calls routines built
into the Mac OS.
-
Doesn't the MacOS get in the
way?
No, actually, on OS 9, it's easy to get it out of our way.
Because the classic Mac OS doesn't do "true multitasking", we can almost entirely take over the
CPU. We are aware that the real time jobs can take on certain "true
multitasking" OS's can be randomly greater than the CPU time by factors
of two to four. And how many parallel computer platforms can you think
of that can start a job using "drag-and-drop"? Using
Pooch with
MacMPI, we can do that, and that's the kind of user
friendliness that we are trying to keep.
Also, these days, OS X's underpinnings is
BSD Unix,
complete with preemptive multitasking and protected memory,
so the question is moot if you are using OS X.
MacMPI_X is a Carbon library that uses Apple's Open Transport
implementation of TCP/IP for communications. All of our main programs now
use that library or its predecessor, MacMPI_IP. Both are now
available.
The original MacMPI calls the Program-to-Program Communications
Toolbox (PPC Toolbox), which has been in the Mac OS since 1990. The PPC
Toolbox has currently runs over the AppleTalk networking protocol.
-
Why are you using Open Transport?
MacMPI using Open
Transport over AppleTalk (see our other page),
and it already achieves, for long messages, a large fraction (significantly
greater than the PPC Toolbox version) of the theoretical peak speed of
100BaseT. The TCP/IP version achieves
even greater performance.
-
Will MacMPI run on Mac OS
X? Will it run better or worse?
OS X does not support the PPC Toolbox, it will be impossible
to use the original MacMPI on Mac OS X.
Performance of MacMPI_X on OS X looks good so far, but
OS X 10.1 does seem to run into a few hiccups at specific message sizes.
We have not yet
seen that the later OS X's has all these problems fixed, but
we and employees of Apple are investigating the matter.
-
Why not use all the
computers on the Internet? Why not use distributed computing solutions
like Sun's Jini, or distributed.net,
or SETI@Home?
Distributed computing is a system where many computers connected
to a sparse network, typically the Internet, are each given a specific
piece of a problem. It works very well for problems that can be cut
up into completely independent pieces. For example, in the RC5 project,
where the task is to break an encrypted message by trying one key at a
time, testing one key does not depend on the result of any other key, or
for 3D rendering, computing one pixel can be computed independently of
any other.
Parallel computing platforms are for large problems where
there are significant interdependencies. In our physics codes, we
have anywhere from hundreds to millions of charged particles interacting
through electromagnetic fields, and what they do depends on the information
about all the others. Consequently, each processor requires information
from all the other processors before proceeding to the next time step.
CPU time between time steps typically range from a fraction of second to
20 seconds or so, depending on the size of the problem.
The information to be delivered can be on the order of
megabytes, making these problems impractical to implement over modem connections.
Any delays or unreliability in getting packets across the Internet would
halt the entire process. So that's why we need some kind of reliable,
responsive, local, fast networking, like 10BaseT at the very minimum.
We're using 100BaseT, and we're seeing if there's a way to use FireWire.
-
What is your cluster's
current hardware configuration?
The cluster consists of eight beige G3/266's, four beige
G3/300's, four "blue & white" G3/350's, eight G4/450's, two dual-processor
G4/450's, one PowerBook G3/333, and one "All-in-One" G3/266. Each desktop
computer uses an Asanté 100Mbps Fast Ethernet PCI card to connect
to a Cisco Catalyst 2900 Series XL. The two dual-processor Macs and four
G4/450's are in one office, four G4's are in another, while the remaining
computers are distributed on desks (on three different floors) for members
of our group.
-
What are the most bizarre kind of
questions you get?
The oddest questions we get are from those that seem like
they don't believe we've done what we've done. They either don't think
our results are relevant or say that we are being deceptive. Questions
like, "How can you possibly compare FLOPs from a Mac to FLOPs from a Cray?
Supercomputer FLOPs are fundamentally different..." or "Your G3 results
couldn't possibly be better than an IBM SP. That simply can't be the truth..."
(Even though the G3 was based on the POWER architecture used in the IBM
SP.)
I don't know how to answer questions like that. All I
can suggest is, try it yourself! Fortunately,
the vast majority of questions and comments are much more supportive, intelligent,
and forward-thinking. Keep 'em coming!
Last updated: October 9, 2002
Back to AppleSeed
http://exodus.physics.ucla.edu/appleseed/appleseedfaq.html