
Massive parallelism has little applicability to the average Computer user who already own more CPU horsepower than they can already use.
The drive toward massive parallelism is being driven in small part by a desire to run a host of virtual machine sessions for server applications and to a lesser extent a need for faster speeds in scientific computing.
The primary drive is marketing - a means by which the various CPU manufacturers can complete for bragging rights in the industry.
My desktop here, is a duel core 64 bit AMD CPU running in 32 bit mode. It is currently and typically displaying a static screen and performing some keystroke processing. Through it's life, the CPU's on this machine will be compute a trillionth or less of their theoretical computational output.
This is the reality of the typical desktop.
Since transistor sizes are about to hit a brick wall and since CPU clock rates hit a brick wall a decade ago, it makes the most sense to manufacturers to look toward parallelism for increasing machine speeds. Multi-core CPU's (complex CPU's) are going to hit a brick wall with something around 16 cores. Now where the Roadmapsters ask, can the industry get more compuational power?
The most obvious answer is with the not so general purpose silicon on the video chips - which as a result of existing parallelism are already pumping out more raw calculations than several thousand general purpose CISC CPU's.
The easiest way in which the hardware guys can implement massive parallelism is to exploit the existing graphic hardware - tweaking it so that it is more compiler and programmer friendly.
Intel's take on the issue is to replace the Graphic CPU with a more general purpose computing engine that can handle graphics as well - tweaked for prodicing improved graphic performance.
Which method is more applicable to general purpose computing?
Intel's roadmap. Clearly.
Which is more flexable and more readily upgraded?
Intel's roadmap. Clearly.
Which method will win in the long term?
Intel's roadmap. Clearly.
Now the question is... Since this massive parallization will be available to all desktops as a natural consequence of having a video processor - no matter who's roadmap is taken... How will 1,000 more general purpose computing power than now exists on the desktop (in an exploitable form), be used when existing CPU's are already underutilized by many orders of magnitude?
The answer to that is Distributed computing.
Desktops will be enlisted, as a matter of course, to provide distributed computational resources to network service subscribers.
Need to model the folding of a protein? Just rent the time from 2% of the unutilized computaitonal capacity of South Korea.
Need to model the atmosphere with a resolution of a meter? Just rent the time from 15% of the North American computational infrastructure.
AI bot not smart enough for you? Just offload it's peak processing requirements to the PC's along your street, or across the city.
Now which Road Map best conforms to such general purpose computing requirements?
Intel's. Clearly.
AMD has fumbled badly with the purchase of ATI. Although they are in some sense better off because they now integrated a potential rival.
AMD had better start generalizing the CPU's on their derrivative ATI graphic cores very rapidly.
But even doing so they run into the problem of running two radically different programming paradigms, while with Intel there is essentially still only one core instruction set to deal with.
So even if first to market wins some market share, INTEL wins in the long run.
And what of the motherboard chipset? Well it's going to be shrunk to an IO block and all the computational power is going to come from the former video co-processors as well.
So future machines are going to consist of two major components, a low CPU count CISC parallel core main processor, and a rapidly reconfigurable massively paralel core for graphic and other massively parallel computing, which also uses a few cores to provide the smarts for most of the IO tasks.
Since a programming dichotomy will continue to exist between the massively parallell environment and the CISC computing environment, and since the OS is not well suited for micromanaging the allocation of alien cores, and dozens of threads for various IO functions. It makes the most sense to partition the OS into two components as well, with one OS subsystem managing the massively parallel cores, and another more traditional system managing the CISC cores.
The programming paradigm will be partitioned similarly. Traditional low thread count parallelism will be implemented within the language that targets the CISC cores, and the massively parallel components will be paged in place by the component of the OS that manages the massively parallel cores, which will run their task as a batch job and return the results the the data space of the main cores.
What you are <NOT> going to see is an intrusion of massively parallel constructs into existing programming languages. You won't see much in the way of fine grained parallelism from within the application framework.
What you will see are OS calls to load massively parallel computing configurations along with calls to load up shared memory regions with the data to process and a call to run and return the result as a batch process.
it will look like.
CALL Configure_Sort_Structure
OSCALL Push_Current_Parallel_Configuration()
OSCALL Confugure_Parallel_Sort(MaxPracticalCPU)
OSCALL Run_Paralell_Sort(SortStructure)
OSCALL Pop_Parallal_COnfiguration()
The programming language for the massively parallel core section of the code will look less traditional, and most probably won't even be touched by most programmers, as they will rely on a library of batch processing fuctions from within the multiprocessing component of the OS itself..
Pre-installed Components will be in part.
Return_Cores_Available()
Configure_3d_Engine(n_cores,Version)
Configure_2d_Engine(n_cores,Version)
Configure_Audio(n_cores,Version)
Configure_Lan(Version)
Configure_USB(Version)
Configure_Serial_ATA(Version)
Configure_Base_IO(Version)
Configure_Parallel_Search(n_cores,Version)
Configure_Parallel_Sort(n_cores,Version)
Configure_Neural_Net(n_cores,Version, height,width,depth,input array, output array)
Configure_RayTrace(n_cores,Version)
Etc...
And with an networked computing infrastructure you are going to have
Request_CPU_Resources(n_cores,Application_For_Purchase_Struct)
External_Configure_Neural_Net(n_cores,Version,Heigtht,width,depth, input Array, output array,Limiting_Price = 2 cents)
External_Link_To_Global_AI_Engine(AI_Name,Consiousness_Instance, Limiting_price = 3 cents)
External_Submit_Stimulus_Global_AI_Engine(Sense_Struct)
External_Return_AI_Status(*Response_Struct)
Etc....
And that, my little children.. Is the outline of your basic computational future for the next 100 years.