Blaming it on their F5s?
I call bullshit. I've seen F5s used for years without problems - the only place that seems to have consistent trouble with them is MSO.
A corrupted file in Microsoft's DNS services brought down its cloud across the world, the software giant has revealed. In a dramatic failure, Office 365 and Windows Live services including Hotmail and SkyDrive fell over for more than three hours earlier this month, causing further embarrassment for Redmond. No customer data was …
I call bullshit. I've seen F5s used for years without problems - the only place that seems to have consistent trouble with them is MSO.
It doesn't say F5 anywhere in the article or the linked blog post, I call knee jerk.
...but that's what they use ; )
As often as they have trouble with them (these BPOS-S outages aren't the half of it - for a while we were averaging a load balancer issue of one sort or another every 2-3 months for our dedicated environment) the only reasonable explanation is operator error.
"further hardening the DNS service to improve its overall redundancy and fall-over capability"
I don't think they need to improve the fall-over capability - that part certainly seems to be working well.
Access to your data anywhere
Then suddenly - No access from anywhere.
Most cloud professionals* would add the proviso: "...but probably best not *that* cloud..."
There are a few companies out there that seem to know their stuff; enviable uptime and reliability. That's never really been Microsoft's strong point, has it?
*Cue luddite hordes with their "cloud professional? Tautology, that!" cleverness...
Logic 101.
I meant contradiction, not tautology. I feel such a fool...
...other than for backup.
Anyone's server can go down. Even your own. But you can do something about your own server. You don't want to be at the hands of someone not entirely interested in your companies profits, only their own liability clause.
Remind me again why I should trust a company with centralized control of my data ... Especially when that company spent decades trying to move control of the personal desktop from mainframe data centers to the personal computer?
No, thank you. I'll keep it in-house. For values of "in-house" that include a couple continents. Honestly, it's not all that hard to roll your own.
I can't tell you how many times 'load-balancing devices in the DNS service respond to a malformed input string' give me problems with my desktop computer. What a relief I can depend on others to fix it now. Then again when 'load-balancing devices in the DNS service respond to a malformed input string' on my network; it has never brought the entire MS online services product suite down across the world for everyone else. Go Cloud!
So what is he is saying is that some idiot added crap to the configuration file and it got propagated across network
by two "rare conditions" = Muppet didn't have a second person to eyeball his/her handiwork before committing the change to the configuration file, in addition they altered the configuration file directly rather than coping the one from the test machine.
Really, this is IT 101....
Not to defend MS, God's forbit, but as i read it, it seams to say that the configuration file itself was OK, problem was that somehow it got corrupted on transition, and the thingy responsible for taking care of such situations failed miserably too.
MS things becoming unresponsive, crashing or failing to work. Where have I heard that (several dozen times) before?
Having seen a 23K MS Word 2003 DLL take down an Exchange server when it failed, I can believe this.
This would be the downside of making everything inter-operable.
BOOM and theres the reason I wont host any company data in "the cloud". If theres downtime to be had I want to be the one instigating or fixing it!
that Windows 7 didn't get any less attention than the cloud did in keeping vital files accurate.
"the software was unable to parse an incorrectly constructed line in the configuration file"
The above translates to "one of our engineers fat fingered it"
Ahh, the vagaries of human error.
Well Its partially human.... The so called file should have a parser to catch errors before they sent out to various other servers no?
That should eliminate any human issue. No whether there is a check at each server to check for validation issues is another possibility. Its called check and recheck and then do a checksum.
Now Apple begins to shake in theirs boots, guys hurry up with the data centre, Billy's network crashed!
'cos it's light and fluffy and insubstantial, I guess
...and occasionally can cause major catastrophies that no-one can control or predict.
... such overlooked little services in a basket.
They did use to run their four DNS servers in the same subnet, didn't they? Oh and they got their all-important everything-depends-on-this sso domain suspended for non-payment, too. Why companies feel they need to sprawl across dozens of domains, all interdependent, is a little beyond me. But maybe reasons why or why not are just a little beyond them. They're certainly not the only tech giants to bugger this one up regularly. As self-proclaimed world improvers employing supposedly the worlds finest tech heads and with plenty of resources to fix it all up neat and tidy, their antics do seem a bit pathetic, however.
I seem to be hitting "rare conditions" daily as far as Microsoft software is concerned...
No wonder MS themselves are hitting them once a fortnight or so.
Or are they running bind9 on Red Hat Linux?
...IT Service Management consultant on contract at MS a year or so ago. I quickly realized that their infrastructure management skill levels and practices were abysmal. I told them what they needed to do and got out as soon as decently possible - I didn't want to be associated with such a crowd of no-hopers. It seems nothing has changed.
"A tool that helps balance network traffic was being updated and the update did not work correctly...
Taste your own medicine MS, So now you know just how bloody frustrating it is when your updates dont work correctly
& did the "helpline" assistant go "ooh, I think you'll have to buy another license for that"?
But funnily enough they do.
Higher ROI with service credits than shares.
At this rate of failures, we're be using them for free.
Huh....epic fail, to be sure, but I have a Hotmail account (foisted upon me against my will by a higher educational institute which shall soon give me a fancy piece of paper that I'll put in a frame and reference on a resume but otherwise never think of again) and never noticed the outage. Then again, I only reluctantly use that account.
...but computers are excellent amplifiers. They wouldn't be the first outfit to fall victim to a self-inflicted DDoS. I think there's must be an axiom about resilient systems in here somewhere.
While the number of single points of failure (SPF) is inversely proportional to the number of redundant features, SPF can only approach (but never reach) a lower limit of 1.
To be personal, and NOT connected to a central point (Mainframe) of failure.
..................for those wonderful companies who promise everything and deliver everything, INCLUDING outages over which you have little or no control.
Microsoft can now make millions of people scream - ALL AT THE SAME TIME!