I'm glad I'm not
I'm glad I'm not in charge of the project to move the whole of Google from GFS to GFS2. Good luck to whoever is. I hope they are wearing massively reinforced underwear when the time comes to press that button!
As its custom-built file system strains under the weight of an online empire it was never designed to support, Google is brewing a replacement. Apparently, this overhaul of the Google File System is already under test as part of the "Caffeine" infrastructure the company announced earlier this week. In an interview with the …
...one day Google will announce the invention of a dynamic file storage and distrubution service along the lines of a p2p grid computing system.
Google will allow people to download a free screensaver to tie in with Google Desktop and Google Documents etc which will use individual user's computers to help with the load, at virtually no cost to Google (assuming they don't use a master data centre to process what the users computers spit out).
Why the hell did they not use lustre? Or haggle a prmotional discount (for advertising space) on something like QFS?
Sure other distributed filesystems are otu there and free so why code an inhouse special or am I thinking lustre is stronger than it is (or imagining gfs is not really that large) ?
HPC eats data space like crazy so why not lustre.
for undisclosed reasons..... lustre, qfs, zfs, gpfs (and ..... ) come to mind. a proper mix of those should easily do the job.
they've been doing it on a terabyte scale for many years, after all.....
paris, cuz she would have figured that a long time ago.....
rg
We (the guys in charge of moving all the data from GFS to its successor, and the day-to-day maintenance) had a good laugh in the office about the comment "wearing massively reinforced underwear". Sometimes it's better not to wear underwear when doing these sorts of upgrades...
As for "other tools"; Lustre was invented as a local network filesystem. GFS was invented to handle thousands of tasks all reading & writing as fast as they could all day every day. The indexing pipeline; download the internet, index it, run a few mapreduces over it to mark down spammy sites, crappy sites, duplicate sites, dead sites etc. and then compress it so it could be shipped all over the place. As Sean says in his interview, these days 'routine use' is dozens of petabytes of data that has to be randomly accessed - as in, the metadata has to stay in RAM.
One of the other posters mentions commercial products - I don't know about the others but the bloke who says GFS scale is way beyond whats available is wrong. The PERCS project paid for by the good ole US government is scaling GPFS to...
# 1 trillion files in a single file system
# 32,000 file creates per second
# 10,000 metadata operations per second
# 6TB/s throughput
# 100's of Pbs of data
# add more scary stats...
All in a single filesystem by 2010 (Not so far away).... Why not work with IBM? I'm sure they'd be so desperate to get some chocolate factory kudos it would be practically given away :-D
http://www.almaden.ibm.com/StorageSystems/projects/percs/
Since its designed for HPC it is different to the demands of Map-Reduce but would support any amount of real-time processing they fancy in the future.
"Still, trying to build an interactive database on top of a file system that was designed from the start to support more batch-oriented operations has certainly proved to be a pain point."
DEC was apparently 30 years ahead of its time, providing Datatrieve as a pain point.
Mine's the one with the wombat in the pocket.