NetApp doesn't do scale-out clustering, everyone knows that. Oh but it does, and it has whacked Isilon firmly in its scaled-out cluster with a SPECSfs2008 benchmark that is 36 per cent faster using half the disks and 116 fewer nodes. EMC Isilon was the top SPECsfs2008 NFS dog with 1,112,705 IOPS recorded from a 140-node S200 …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 2nd November 2011 15:45 GMT dikrek

It's important to realize it's really one cluster

Hi all, D from NetApp.

It's crucial to realize that the NetApp submission is one legit cluster with data access and movement capability all ove the cluster and a single management interface, and NOT NAS gateways on top of several unrelated block arrays.

See here for a full explanation, towards the end: http://bit.ly/uuK8tG

D

3 1
1. Wednesday 2nd November 2011 16:50 GMT Isilon_Nick
  
  But NOT Scale-Out! ;)
  
  It's important to realize this is 96 RAID groups, 24 file systems and 24 separate controllers, which means you can't actually do real quotas, replication, or snapshots. Ironically, not at all enterprise and much more appropriate for HPC.
  
  Its also worth pointing out that you can't scale a single workload without painstaking alignment of data across the various controllers.
  
  A solid SPEC submission, no doubt - but not a scale-out result.
  
  @Isilon_Nick
  
  1 2
  1. Wednesday 2nd November 2011 19:42 GMT dikrek
    
    sure it's scale-out
    
    Just not the exact same way Isilon does it. Doesn't necessarily tackle the same problems, either.
    
    What do you mean we can't do "real quotas, replication or snapshots"? Those are things more deeply ingrained into the NetApp DNA than on any other platform :)
    
    3 0
  2. Thursday 3rd November 2011 14:56 GMT Moondevil
    
    I'd suggest.....
    
    ...that you understand OnTap 8.1 Cluster mode before posting. Quotas, replication, and snapshots, as well as deduplication, compression, thin provisioning etc are all available across the cluster. Additionally, no workload alignment was carried out as detailed in the submission. 23/24th of the workload went across the cluster network for data access.
    
    0 0
Wednesday 2nd November 2011 17:12 GMT Nate Amsden

devil in the details

I thought something was up when the "Number of files systems" was "single namespace", and they mention the clients mounting 24 different file systems from 10 different IPs, 240 file systems per client?

"For UAR compliance, each flexible volume was mapped to a subdirectory of the global namespace under root (/data1,/data2,...,/data24). Each volume was accessed over all data network interfaces (ip addresses ip1...ip24) such that each volume had a unique mount point on each node. This ensured that 1/24th of the total access on each node was done through a local interface and 23/24th of the total access was done through interfaces which were remote to the node the volume resides on. For these remote accesses, data traversed the cluster network. There were a total of 24 IP addresses available for data access, one per node. Each client mounted all 24 volumes using 10 different target IP addresses for a total of 240 mount points. The ten IP addresses selected per client for mounting each volume was done in a round-robin manner from the list of 24 IP addresses such that each successive client used the next ten IP addresses in the series. For example LG1 was assigned the following mount-point list: ip1:/data1, ip2:/data1,...,ip10:/data1,ip1:/data2,...,ip10:/data2,...,ip1:/data24,...,ip10:/data24. LG2 was assigned the following mount-point list picking up next IP adress in the list for each volume:ip11:/data1,ip12:/data1,...,ip20:/data1,...,ip11:/data24,...,ip20:/data24. LG3 continued from the next IP address circling back to the start of the series once the last IP was reached: ip21:/data1,..,ip24:/data1,ip1:/data1,..,ip6:/data1,...,ip21:/data24,..,ip24:/data24,ip1:/data24,..,ip6:/data24. This was done for all 36 clients. This ensured that data access to every volume was uniformly distributed across all clients and target IP addresses. The volumes were striped evenly across all the disks in the aggregate using all the SAS adapters connected to storage backend. "

I wish SPEC SFS had the same disclosure as SPC-1 (esp wrt pricing), but it still has some interesting tidbits.

The real story is of course the flash cache, 12TB of it is where the performance is of course coming from, too bad(for them) their 'cluster mode' isn't as good as the competition. I do not see Isilon short stroking in their SFS results(as I believe was mentioned in the article) but it does appear that NetApp is, having exported only half of the capacity of the system to the clients.

Isilon is exporting 3 times more usable storage to the clients than NetApp.

Wonder what Isilon would get if they ran their test with RAID 10 instead of RAID 5 13+1 (ugh!!) - I'm thinking greatly improved write performance and doing parallel reads on both members of the RAID mirror for increased read performance. Isilon was also using 10k RPM disks, bump it to 15k RPM and RAID 10 just for shits & grins and see what the results are! I would expect at least say a 40% increase in performance (at least for the disks) ? Whether or not the controllers can handle that is another question.

Why anyone would use RAID 5 or RAID 6 when I/O is the most critical thing (as is in these tests) boggles my mind. I know NetApp has self imposed limitations of RAID 6 on their platform, another boggle the mind thing, I don't know if Isilon does the same, I guess I should fire off an email to one of my friends ad Isilon and find out.

With NetApp only exporting half of their usable storage it would make a lot more sense to do RAID 10 there too(this IS an I/O test after all), if they supported it, running RAID DP on 450GB 15k RPM disks is a waste, simply put, the risk of a double disk failure is very small with that spindle RPM and disk size. RAID DP certainly makes sense if your running 15+2 on 2TB SATA disks by contrast when using old style "whole disk" based RAID schemes anyways.

Could NetApp have pushed the I/O numbers higher? The 6240 tested tops out at 6TB of flash cache per controller and the SFS test had only 512GB per controller, maybe 512GB was enough given the workload(or does that mean the controller was tapped out with 512GB), I don't know.

1 3
1. Wednesday 2nd November 2011 19:10 GMT dikrek
  
  It helps to understand the details
  
  Hi Nate,
  
  Short-stroking?? Isilon only used about 128TB of the 864TB available! Go to my post for more details. In general, exported vs usable means nothing performance-wise for NetApp due to the way WAFL works.
  
  Isilon doesn't do typical RAID either, it's per file, so your observations are not quite correct. More education is needed to understand both architectures - and they are as different from each other as they could possibly be... and don't solve all the same problems.
  
  Anyway - not sure what your affiliation is but if you're on the customer side of things you should be happy we are all trying to make faster, more reliable and more functional boxes for y'all! :)
  
  D
  
  0 0
2. Wednesday 2nd November 2011 21:19 GMT dikrek
  
  Regarding the number of mount points
  
  Looking at the Isilon submission:
  
  "Uniform Access Rule Compliance
  
  Each load-generating client hosted 64 processes. The assignment of processes to network interfaces was done such that they were evenly divided across all network paths to the storage controllers. The filesystem data was striped evenly across all disks and storage nodes."
  
  How are the 64 processes per client being generated in your opinion? Single mount point per client?
  
  0 0
Wednesday 2nd November 2011 21:19 GMT Anonymous Coward

No

People use raid-5 or raid-6 because this is real-world implementations. I find it humorous you make all these comments about Isilon and NetApp when you clearly don't have a clue what you're talking about. Isilon doesn't do "raid-10" because it can't. They aren't setup as "13+1". They don't do traditional raid, they're based on reed-solomon. It doesn't use 15k drives, because it cant. It does however have flash, just like the filers. Except it's got more.

0 0
Thursday 3rd November 2011 16:33 GMT Anonymous Coward

So they clustered 24 x FAS6240's to deliver 10x the performance of 1 x FAS6080. Seems like a lot of effort and cost for a disproportional reward!

0 0
1. Thursday 3rd November 2011 22:08 GMT dikrek
  
  No, we clustered 12x 6240 HA systems, not 24!
  
  The 6080 in the submissions is a 2-controller system.
  
  The 24 nodes come in 12 pairs.
  
  So 12 systems to get 10x the performance is pretty darn good scalability for a scale-out architecture that has the systems connected over a back-end cluster network which itself imposes some drop in performance.
  
  Is this more clear?
  
  D
  
  2 0