optimising for PowerPC/Altivec/Cell
forgiving the X86 only centric readers commenting so far its clear that they dont understand the PPC/Altivec or the other vector units on the CELL
its true the PPC/Altivec/CELL can stand some optimisations when running PPC linux ,as the old Mac PPC Coders obviously didnt want to undemine the OSx Altivec optimisations inside that OS.
it strikes me reading the linked page to the cluster page http://gravity.phy.umassd.edu/ps3.html that infact it does appear that he used a generic PPC linux, and didnt bother to look at optimisations outside the compiler.
apparently not even considered using the PPC coders choice, 'PPC Gentoo' were all your current PPC Altivec multimedia optimised code is first produced by the likes of lu_zero and the altivec guys at Power dev.
http://www.powerdeveloper.org/forums/viewtopic.php?t=1082
take a look at the old school PPC guys code thread here http://www.powerdeveloper.org/forums/viewtopic.php?t=1426&postdays=0&postorder=asc&start=0
for a practical tryed and tested code base.
http://www.powerdeveloper.org/forums/viewtopic.php?t=1426&postdays=0&postorder=asc&start=0
http://www.powerdeveloper.org/forums/viewtopic.php?t=1494
its clear that the PPC linux DOES NOT currently use any Altivec vector optimisations (as the x86 linux with its limited MMX vector unit does).
if you read some of the Altivec threads found at powerdeveloper, you will find some answers/Numbers, and the lads are always looking for feed back on the likes of the freevec optimised codebase http://bbrv.blogspot.com/2008/02/freevec-updated.html ,and helping the new user/PPC programmer better understand the PPC/CELL and their vector optimisations you might find informative and practical.
look at this chart in the thread above for an indication of a generic memcopy/network speedup for instance.
-----------
gunnar:
Quick update:
I have looked at the Linux Kernel code a bit.
Its not difficult to improve the performance on PPC.
The Linux Kernel has a copy function which is used to cope between kernel and user space.
As this function copies a lot of data its performance has direct influence on network or filessystem performance.
Improving the speed of it was actuelly easy as you can see:
http://www2.greyhound-data.com/gunnar/glibc/throughput_970.gif
http://www2.greyhound-data.com/gunnar/glibc/throughput_cell.gif
Especially on the Cell improving this function
does result in feelable performance improvements of the total Linux system.
I'll do some more testing and then publish the patch soon.
Cheers
Gunnar
-----------------------------
Posted: Wed Nov 14, 2007 4:43 am Post subject: Possible benefits - optimization for PowerPC
--------------------------------------------------------------------------------
Hello,
Mostly all Linux applications are developed in C or C++. People often believe that C compiler are good enough to guarentee good performance. This is unfortunately not the case, especially on PowerPC manual optimization can make a huge difference.
Here an example of a memcpy on PowerPC...
a) Normal C routine working on Byte
150 MB/sec
b) Normal C routine working on Long (32bit)
800 MB/sec
c) Normal C routine working on quad (64bit)
1000 MB/sec
** This is best performance that you can archive by algorithm design, using C language **
d) Normal C routine working on quad (64bit) + with two ASM Cache-instruction added.
1380 MB/sec
e) ASM routine better optimized for this PPC architecture
2750 MB/sec
From 150 MB/sec to 2750 MB/sec is quite a difference.
As you can see by using optimized code you can achieve 20 times better performance!
Gunnar
----------------------------------