As its custom-built file system strains under the weight of an online empire it was never designed to support, Google is brewing a replacement. Apparently, this overhaul of the Google File System is already under test as part of the "Caffeine" infrastructure the company announced earlier this week. In an interview with the …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 12th August 2009 11:29 GMT Bassey

I'm glad I'm not

I'm glad I'm not in charge of the project to move the whole of Google from GFS to GFS2. Good luck to whoever is. I hope they are wearing massively reinforced underwear when the time comes to press that button!

0 0
Wednesday 12th August 2009 13:25 GMT MyHeadIsSpinning

I predict that...

...one day Google will announce the invention of a dynamic file storage and distrubution service along the lines of a p2p grid computing system.

Google will allow people to download a free screensaver to tie in with Google Desktop and Google Documents etc which will use individual user's computers to help with the load, at virtually no cost to Google (assuming they don't use a master data centre to process what the users computers spit out).

0 0
Wednesday 12th August 2009 13:25 GMT Richard 102

Wow

I've done my share of migrations in my time, but that is one that I'd think twice about.

0 0
Wednesday 12th August 2009 13:25 GMT Jay Jaffa

buy a bigger server

Why don't they buy a bigger server? We've just moved from a 2 CPU server to a 4 CPU server and added some more disks - everything runs much faster and we've got loads more room.

Hah!

0 0
Wednesday 12th August 2009 14:00 GMT Lockwood

Extra help

"Hey Dave! I'm stuck getting this migration to finalise. Any ideas?"

"Google it - oh wait."

0 0
Wednesday 12th August 2009 14:04 GMT Anonymous Coward

Forgive me if it's obvious but why re-invent the wheel?

Why the hell did they not use lustre? Or haggle a prmotional discount (for advertising space) on something like QFS?

Sure other distributed filesystems are otu there and free so why code an inhouse special or am I thinking lustre is stronger than it is (or imagining gfs is not really that large) ?

HPC eats data space like crazy so why not lustre.

0 0
Wednesday 12th August 2009 15:03 GMT Lockwood

Best plan ever...

FAT16

0 0
Wednesday 12th August 2009 15:03 GMT Anonymous Coward

@jay jaffa

Irony overload !

;-D

0 0
Wednesday 12th August 2009 15:03 GMT Gary Littlemore

@MyHeadIsSpinning

Look at Google Wave

http://wave.google.com/

0 0
Wednesday 12th August 2009 15:03 GMT Anonymous Coward

@anon coward

"Forgive me if it's obvious but why re-invent the wheel?"

Because - to use the same analogy - they want 21 inch low profile alloys, not steel rim 17 inchers.

0 0
Wednesday 12th August 2009 15:03 GMT Anonymous Coward

google reinventing the wheel.......

for undisclosed reasons..... lustre, qfs, zfs, gpfs (and ..... ) come to mind. a proper mix of those should easily do the job.

they've been doing it on a terabyte scale for many years, after all.....

paris, cuz she would have figured that a long time ago.....

rg

0 0
Wednesday 12th August 2009 15:10 GMT valen

GFS2... truely awe-inspiring

We (the guys in charge of moving all the data from GFS to its successor, and the day-to-day maintenance) had a good laugh in the office about the comment "wearing massively reinforced underwear". Sometimes it's better not to wear underwear when doing these sorts of upgrades...

As for "other tools"; Lustre was invented as a local network filesystem. GFS was invented to handle thousands of tasks all reading & writing as fast as they could all day every day. The indexing pipeline; download the internet, index it, run a few mapreduces over it to mark down spammy sites, crappy sites, duplicate sites, dead sites etc. and then compress it so it could be shipped all over the place. As Sean says in his interview, these days 'routine use' is dozens of petabytes of data that has to be randomly accessed - as in, the metadata has to stay in RAM.

0 0
Wednesday 12th August 2009 15:10 GMT jumblie

Huge scale

The scale of GFS is enormous, far beyond Lustre or QFS. Last I heard it was scaling to about 22,000 nodes.

0 0
Wednesday 12th August 2009 15:57 GMT Neil Greatorex

@ Boltar

"Because - to use the same analogy - they want 21 inch low profile alloys, not steel rim 17 inchers."

Nope, they want caterpillar tracks, but capable of 417.3 MPH :-)

0 0
Wednesday 12th August 2009 15:57 GMT Anonymous Coward

How much to develop from scratch?

One of the other posters mentions commercial products - I don't know about the others but the bloke who says GFS scale is way beyond whats available is wrong. The PERCS project paid for by the good ole US government is scaling GPFS to...

# 1 trillion files in a single file system

# 32,000 file creates per second

# 10,000 metadata operations per second

# 6TB/s throughput

# 100's of Pbs of data

# add more scary stats...

All in a single filesystem by 2010 (Not so far away).... Why not work with IBM? I'm sure they'd be so desperate to get some chocolate factory kudos it would be practically given away :-D

http://www.almaden.ibm.com/StorageSystems/projects/percs/

Since its designed for HPC it is different to the demands of Map-Reduce but would support any amount of real-time processing they fancy in the future.

0 0
Wednesday 12th August 2009 16:56 GMT Munchausen's proxy

Get off my lawn

"Still, trying to build an interactive database on top of a file system that was designed from the start to support more batch-oriented operations has certainly proved to be a pain point."

DEC was apparently 30 years ahead of its time, providing Datatrieve as a pain point.

Mine's the one with the wombat in the pocket.

0 0
Wednesday 12th August 2009 21:53 GMT Anonymous Coward

RE: How much to develop from scratch?

Willy Waving...

0 0
Wednesday 12th August 2009 22:22 GMT David 81

"So Google was building for the short term."

No, they made a decision to simplify the design in order to deliver faster... When you design a new system, you will not be able to put everything you want in the first iteration.

0 0