Re: Don't de-dupe your primary! && moar FUD
AC:
Dedupe isn't for every data set, but it works great for many of them. vmdks in particular, and other types of big ass files you can see up to 75% savings in some cases add block compression and you can squeeze out some impressive space savings. As for the performance penalty, it's just CPU, which is cheaper then memory. Since it runs as a low priority background process, it affects performance as much as disk scrubs do (they don't FWIW).
This means you can actually get more efficiency out of things like SSD, improving overall performance as deduped data sets fit better in memory. As you point out, there is a constant discrepancy between "Extra" space due to drive capacity and IOPS requirements. Dedupe + PAM (or just more memory netapp plz kthnx) acts to balance this tradeoff out to achieve a balance between the IOPS and capacity requirements.
Geoff, Texas Todd, 3par and Compellent ....
You guys are awesome block heads (Not meant as an offence, just pointing out that you only do block protocols) but you're missing a couple things that are different in the HPC and to a certain extent the "cloud" space.
1) NFS 4.1 means the "clients" can do the tiering. They can frankly do it better than any array can. Heck, you can't get clever with the automounter and NFS v3 to achieve similar results. This magic block tiering strategy is a head scratcher for these environments.
2) I wouldn't describe any FC block based system as massively parallel. Topping out at 8 or 16 controllers in not massively anything. I would argue that both compellent and 3par products are impressive, but not *MASSIVE*. NTAP, ibrix, and Isilon are larger, but aren't *MASSIVE* either. Well, maybe Ibrix but I digress.
3) Most Virtual environments, even Netapp ones keep virtual machines on different "tiers" of WAFL (a higher level abstraction than "raid group" or "disk"), and cache on cache architectures are much more effective for this than having the array decide which blocks are hot for random read workloads. Since Netapp arrays don't really have much in terms of write performance issues, this auto tiering doesn't do much for this corner case that might exist in a block environment.
For Netapp tiering just get FC or SAS and configure those as caching volumes. (you can do this in the same controller, or an outboard cache)
Next, for a VMDK farm of 8,000 VMs that live on SATA simply pre populate the cache tier by let it warm up naturally or if you're in a hurry run ls -la across the directory tree.
Let PAM handle the rest.
2/3rds of the above technologies have been around since 2002 or prior, adding PAM just makes it go faster and adds an additional "tier". Granted, this assumes you're running vmware over NFS, but it makes more sense to me than having 25 different "tiers" of raid types and RPMs
My last comment... I swear.
/goes back to kicking his HDFS cluster