AltiVec Fractal Demo Benchmarks
The G4/450's of our AppleSeed Macintosh cluster achieved the
following performance running the June 2000 version of the
AltiVec Fractal Demo IP, computing the default z4
fractal image (single precision) with the Maximum Count set to 65536. This
code, compiled with MacMPI_IP.c
using Metrowerks CodeWarrior Pro 5, decomposes the problem as interlaced
lines, resulting in efficient parallelism. It also fills in as many bubbles
in the instruction pipeline as it can in an attempt to use the G4's AltiVec
unit efficiently:
| Number of G4/450's | MFlops without AltiVec | MFlops using AltiVec * |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We also ran the June version of the AltiVec Fractal Demo IP on the same problem using the AltiVec* instruction units in the UCLA Statistics Department's cluster of 16 G4/400's. Increasing the Maximum Count parameter (MaxCount) makes the problem more challenging, while the size of the messages remains constant.
| Number of G4/400's | MFlops (MaxCount=4096) | MFlops (MaxCount=16384) | MFlops (MaxCount=65536) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* This flop performance calculation shows "Honest" MegaFlops in two significant respects:
We hope that doesn't happen too often, but just in case, this code has mechanisms to properly flag the elements and tally the flops actually used to compute the pixels. That is, as soon as an element has finished, any further work on that element is uncounted, even though the AltiVec hardware is continuing to crank away. Counting the extra unused flops makes the calculated performance jump by up to a factor of two for some images.
On February 6, we established a new milestone with AppleSeed.
We were able to run a 100 million particle 3D electrostatic PIC simulation
on an 8 node Macintosh G4/450 dual processor cluster. The total time
was 17.8 seconds/time-step, with a grid of 128x128x256. We used Bedros
Afeyan's Polymath 2000 cluster,
which has 1 GB memory per node, since we don't have any machines large
enough at UCLA to do the job. The current cost of such machines is
less than $2500/node. It was only 5 or 6 years ago that such calculations
required the world's largest supercomputers.

Comparison of AppleÕs Gigabit Ethernet (1000BaseT) adapter with the earlier Fast Ethernet (100BaseT) adapter on two dual-processor G4/450's running OS 9. Measured Bandwidth (MBytes/sec) is for 2 processors connected with a cross-over cable exchanging data as a function of message size. Results show that the Gigabit Ethernet is more than 3 times faster than Fast Ethernet.
http://exodus.physics.ucla.edu/appleseed/appleseed.html
last update: April 17, 2001