Consistent performance

Our Physon (in Bulgarian only) HPC cluster was originally composed of four dual Xeon MoBos with QDR (20 Gbps) InfiniBand links as interconnection fabric for MPI message passing. Armed with eight quad-core Xeon E5335 (2,0 GHz, 2x4 MB L2 cache) the system peaked at Rmax equal to 209,0 GFLOPS for Nmax of 74880 on the HPL test. Since the theoretical top performance is Rpeak equal to 256 GFLOPS, our old nodes have parallel efficiency of 0,816.

Now, about an year later, we got a shiny new addition of eight dual MoBos with the newer quad-core Xeon E5420 (2,5 GHz, 2x6 MB L2 cache). Since four nodes were already installed and powered on, I run the HPL test again to see how well the new processors compare to the old ones. The new nodes peaked at Rmax equal to 256,2 GFLOPS for the same matrix size. With a theoretical top performance of Rpeak equal to 320 GFLOPS we still have a good parallel efficiency of 0,801 (or 1,226x the performance for 1,25x the CPU frequency).

It is now obvious why most cluster systems in the Top500 list utilize the (now) relatively cheap InfiniBand as their interconnection fabric instead of the more sophisticated and much more expensive proprietary interconnects.