CPU cycles as the most precious data centre resource...
"Virtualised multi-core, multi-threaded servers are impatient. They want data for the apps in their virtual machine instantly. It's odd; now that CPU cycles are cheaper than ever before, they are treated as the most precious resource in the data centre."
Rarely have a seen a case of missing the point more thoroughly than this quote. It's not that people are worried about "wasting" cheap CPU. It's that latency time on disk I/O has failed to keep up with processors. In consequence, for many applications, the application is bottlenecked by the storage. I know of many apps which spend 90% or more of their time I/O bound. So the issue is not "wasting" CPU time, but simply not getting through workloads fast enough or giving quick enough end-user response times. It's not the CPU time that's being seen as precious, but the poor end user who's sitting in a call centre with an impatient customer.
As for some sort of direct-access memory storage model over PCIe being faster, then of course. But just how relevant is that to most large real-world apps. Firstly it's an extremely good idea to establish a clear distinction between persistent storage and volatile working space. For this you need clear and controlled access methods with controlled APIs that can provide for security, protection from rogue apps, clean restart points, data sharing and much else. Those APIs can be at various levels - blocks, files, database etc. but exist they must. Those are also ideal points to establish shared access models and are hence ideal break points for network access (as anybody whose worked on large application architectures can agree).
Then there is the issue of just how many apps can exploit ultra-low latency times. I know of many real-world apps which are I/O bound at 10ms access times. However, I know of none which would be in that position with 100 micro-second latency times, an access time perfectly within the bounds of what can be achieved on common network protocols, such as FC or 10Gbps Ethernet. Indeed in many cases the largest element of that delay is not in the time on the wire, but in navigating the software stacks.
This is not to deny that there might be some specialist uses of direct memory persistent access models over PCIe, but these are likely to be very low functions. Perhaps the networked storage units themselves (albeit dealing with single point of failure issues is important).