The Eternal Return
With all due respect to Unices and the Linuces, their lineage is dominated by mainframe thinking: a single, highly valuable CPU being divided up amongst multiple tasks to obtain maximum utilization of a scarce resource.
The price of a CPU plummeted, but few people have shaken off the old assumption that CPUs are still best exploited as shrunken, multiuser, multitasking mainframes with all the baggage that this implies. Symmetric MultiProcessing hardware has been accomodated to some degree by improvisation on the monolithic OS designs, but multicore architecture will look a lot more like shrunken PCs on a network, suitably simplified for on-die networking etc, but with increasing amounts of private, per-core RAM that SMP doesn't really address.
Trying to take advantage of massively multicored hardware while dragging single-processor and SMP baggage along will necessarily produce its share of backward-looking monsters and things indigest. The recent claim by Stonebraker et al that 10X to 20X improvements in database performance may be had is based on the assumption of a (gasp!) single threaded database application running in a dedicated core with gobs of dedicated RAM. This bolshevik approach to the application versus kernel threading debate assumes that we will soon be living in a one-thread-per-core world, at least in terms of application design. Although some "housekeeping chore" cores may multithread at the application or the kernel level, new designs like the H-Store will throw out almost a half-century of mainframe-think and seize an entire processor for themselves without inconveniencing other applications on the die. (Just think of all the context-switching overhead that will suddenly disappear.)
What will the software infrastructure for these dedicated cores look like? My guess is that the winning software architecture will be microkernels churning away in their individual cores, loosely coupled with each other via a message-passing system. Although this approach exacts a price in terms of message-passing overhead, it more than repays it in parallelism and scalability unobtainable with the monolithic and SMP approaches.
I've seen the future, and it works: the Tandem K-Series, designed in the 70's, pioneered the massive commercial application of loosely-coupled message passing kernels. For about two decades, the K-Series (and later the S-Series) processed most of the worlds credit and debit card transactions, and powered major stock exchanges as well. (For all I know, they still may.)
Just as CPU architecture evolved from simple to complex instruction sets and fell back to reduced instruction sets in response to hardware evolution, the time approaches when OS and application architectures may experience a similar return to their roots as well.