Wow
"Facebook was juggling about 15 billion on-site messages a month (about 14 TB) and 120 billion chats messages (11 TB) "
That's a whole lotta Farmville updates!
About a year ago, when Facebook set out to build its email-meets-chat-meets-everything-else messaging system, the company knew its infrastructure couldn't run the thing. "[The Facebook infrastructure] wasn't really ready to handle a bunch of different forms of messaging and have it happen in real time," says Joel Seligstein, a …
hi,
What is actually being said here is Oracle, IBM and other commercially available software is so overpriced Facebook would rather create its own. Big fail for Oracle. Where are those super sales people taking the executive out to lunch and golf?
I know that Facebook is big-big-big yardy-yardy-ya but this whole article makes Facebook look like look like one big post-grad project.
Sitting in a cold Computer Science Lab, with no money, little kit and an unmovable demo deadline... this would be genuinely very-very impressive… “an effort to minimize data loss” clearly looked in very great detail at complex scenarios... but…
The data volumes aren’t CERN LHC… but LHC gives a clue to the alternatives…
Facebook is not poor, and they are not pushing Human knowledge into the unknown.. they could have simply stopped off at Oracle with a shopping list and splashed the cash…
Putting large unspecified blobs of data into SQL is a performance killer for any SQL, MySQL included. SQL is storage for _STRUCTURED_ data, not for blobs.
So Facebook's decision is not particularly surprising. There are technical solutions to this (isolate the structure, put in SQL, put blob elsewhere), but none of them is likely to scale to Facebook size.
maybe they've got the same or similar reasons as Google had for creating "Big Table". I assume you're familiar with a product Oracle sells that does the same thing at the same speed. I don't think CERN is a valid comparison, unless the particles are sending each other emails.
I don't care for Facebook much either, but it does sound like they've got some very bright guys working there. Larry Whatshisname doesn't need more cash in his greedy maw either.
I know that Facebook is big-big-big yardy-yardy-ya but this whole article makes Facebook look like look like one big post-grad project.
Sitting in a cold Computer Science Lab, with no money, little kit and an unmovable demo deadline... this would be genuinely very-very impressive… “an effort to minimize data loss” clearly looked in very great detail and complex scenarios but…
The data volumes aren’t CERN LHC… but LHC gives a clue to the alternatives…
Facebook is not poor, and they are not pushing Human knowledge into the unknown.. they could have simply stopped off at Oracle with a shopping list and splashed the cash…
+1 for Begineer, I'm going to start using that one.
Begineer = newly qualified software engineer (usually highly trained in AI and other useless stuff).
It goes with "Vidiot" which is a term that we use to describe self-appointed experts who talk complete bollocks about anything to do with video signals or compression.
Among talking-head gobshites spouting marketspeak without any inkling of what they're on about.
And behind the curtain, maybe a dozen actual code monkeys on salary, wrestling with the cold, cruel realities of mmap() and fsync(). Rise up, brethren! Seize the means of production and overthrow your corporate mouthwhore overlords!
Am I the only one confused on why they're sending a clone of a message to the recipients? Cloning a 200kb message to 6 people, along with the backend to shard it out, takes way more resources over just letting those 6 people reference the original. If the reader "deletes" it, they'd simply be removing their link to the post, while still letting the 5 other readers get it...
To be fair to Snoreacle, just because an app like Hbase is better for one particular and very specialised task, it does not mean it is suited to the much more common, commercial tasks that Oracle DB is suited to. And those commercial uses probably add up to far, far more licence income than Oracle would have made if it had just made Hbase. I'm sure Larry's response would just be one big "meh".
"Even before the new messaging system was rolled out, Facebook was juggling about 15 billion on-site messages a month"
Useless measurement here, how about some real-world measurements, like 6000 messages (transactions) per second, globally.
Most of those are just few bytes (less than a DB block, 4k, anyway) so a lot but nothing exceptional.
Transaction tests have managed to squeeze 60 000 transactions per second from a Oracle server running on single PC machine in 2007. Not the same thing, lab tests are something else than real world, but gives us the scale hardware can cope with.
Large scale computing just does not work like this. There are so many examples of people who think they can apply 'enterprise' ready solutions to distributed web based apps and have failed - even with basic web sites (see the france.fr debacle for an excellent example).
I mean you no ill will but have been working in this area for many years and you really do have to think in a different manner than simply transactions per seconds.
Some simple examples (some mentioned already in the article). Think in terms of mtbf of kit when you are running a farm that contains as few as 1000 machines. Hardware failures become normal events that you have to engineer for. The idea that you can accept the 4 hours downtime per two years onb your database server and still hit you 99.9% SLA does not apply.
What about backing up 100s of TB of data? A distributed model can eliminate the need for backups.
latency - these database backends also need to return your profile page, your unread messages count and so. Cache solutions need to recalculate.
Creeping death. One node in a cluster goes down, increasing the load on the other nodes and causing the whole lot to go down.
Etc., Etc.