My experience with ScaleMP is
not very good. I tested software which ran beautifully on a 24 core (6 quad-core) opterons (18-22 x speed-up) crunching through multi-scale analysis of a 1.2 gigapixel image in 60 seconds. On a 64 core (4x 16 cores if I am right) ScaleMP box performance was DISMAL. As more threads are added, the performance tends to drop severely. On a single thread I would get a timing of say 60 seconds for a smallish data set, on 2 threads it took anything up to 5 minutes. The scheduler NEVER puts two threads of the same program on a single board, but scatters them far and wide. You can only gain speed up on these boxes if you have many light-weight processes which do not need to share much memory. Did we not have clusters for that?