The Channel logo

Microsoft cloud evaporated by one busted file

A corrupted file in Microsoft's DNS services brought down its cloud across the world, the software giant has revealed. In a dramatic failure, Office 365 and Windows Live services including Hotmail and SkyDrive fell over for more than three hours earlier this month, causing further embarrassment for Redmond. No customer data was …

This topic is closed for new posts.
Anonymous Coward
Anonymous Coward

Blaming it on their F5s?

I call bullshit. I've seen F5s used for years without problems - the only place that seems to have consistent trouble with them is MSO.

Anonymous Coward
Anonymous Coward

Err?

It doesn't say F5 anywhere in the article or the linked blog post, I call knee jerk.

Anonymous Coward
Anonymous Coward

It doesn't say it

...but that's what they use ; )

As often as they have trouble with them (these BPOS-S outages aren't the half of it - for a while we were averaging a load balancer issue of one sort or another every 2-3 months for our dedicated environment) the only reasonable explanation is operator error.

Anonymous Coward
Anonymous Coward

"further hardening the DNS service to improve its overall redundancy and fall-over capability"

I don't think they need to improve the fall-over capability - that part certainly seems to be working well.

Keep it all on the cloud

Access to your data anywhere

Then suddenly - No access from anywhere.

Windows

Most cloud professionals* would add the proviso: "...but probably best not *that* cloud..."

There are a few companies out there that seem to know their stuff; enviable uptime and reliability. That's never really been Microsoft's strong point, has it?

*Cue luddite hordes with their "cloud professional? Tautology, that!" cleverness...

Facepalm

D'oh.

Logic 101.

I meant contradiction, not tautology. I feel such a fool...

I would say that most profesionals would not keep their data on any cloud...

...other than for backup.

Anyone's server can go down. Even your own. But you can do something about your own server. You don't want to be at the hands of someone not entirely interested in your companies profits, only their own liability clause.

Numpties.

Remind me again why I should trust a company with centralized control of my data ... Especially when that company spent decades trying to move control of the personal desktop from mainframe data centers to the personal computer?

No, thank you. I'll keep it in-house. For values of "in-house" that include a couple continents. Honestly, it's not all that hard to roll your own.

Anonymous Coward
Anonymous Coward

Let the 'Professionals' handle it

I can't tell you how many times 'load-balancing devices in the DNS service respond to a malformed input string' give me problems with my desktop computer. What a relief I can depend on others to fix it now. Then again when 'load-balancing devices in the DNS service respond to a malformed input string' on my network; it has never brought the entire MS online services product suite down across the world for everyone else. Go Cloud!

FAIL

Fixed it for you!

So what is he is saying is that some idiot added crap to the configuration file and it got propagated across network

by two "rare conditions" = Muppet didn't have a second person to eyeball his/her handiwork before committing the change to the configuration file, in addition they altered the configuration file directly rather than coping the one from the test machine.

Really, this is IT 101....

Boffin

Well, not exactly.

Not to defend MS, God's forbit, but as i read it, it seams to say that the configuration file itself was OK, problem was that somehow it got corrupted on transition, and the thingy responsible for taking care of such situations failed miserably too.

Anonymous Coward
Anonymous Coward

Hmmm

MS things becoming unresponsive, crashing or failing to work. Where have I heard that (several dozen times) before?

FAIL

Having seen a 23K MS Word 2003 DLL take down an Exchange server when it failed, I can believe this.

This would be the downside of making everything inter-operable.

Mushroom

BOOM and theres the reason I wont host any company data in "the cloud". If theres downtime to be had I want to be the one instigating or fixing it!

Alert

I'm glad to know...

that Windows 7 didn't get any less attention than the cloud did in keeping vital files accurate.

Megaphone

Exterminate! EXTERMINATE!

"the software was unable to parse an incorrectly constructed line in the configuration file"

The above translates to "one of our engineers fat fingered it"

Ahh, the vagaries of human error.

Thumb Down

re Exterminate! EXTERMINATE!

Well Its partially human.... The so called file should have a parser to catch errors before they sent out to various other servers no?

That should eliminate any human issue. No whether there is a check at each server to check for validation issues is another possibility. Its called check and recheck and then do a checksum.

Anonymous Coward
Anonymous Coward

Now Apple begins to shake in theirs boots, guys hurry up with the data centre, Billy's network crashed!

Unhappy

Why do they call it a cloud?

'cos it's light and fluffy and insubstantial, I guess

...and occasionally can cause major catastrophies that no-one can control or predict.

Pint

BINGO!!!

nuff said.

The beer is on me.

Anonymous Coward
Anonymous Coward

Such an important function...

... such overlooked little services in a basket.

They did use to run their four DNS servers in the same subnet, didn't they? Oh and they got their all-important everything-depends-on-this sso domain suspended for non-payment, too. Why companies feel they need to sprawl across dozens of domains, all interdependent, is a little beyond me. But maybe reasons why or why not are just a little beyond them. They're certainly not the only tech giants to bugger this one up regularly. As self-proclaimed world improvers employing supposedly the worlds finest tech heads and with plenty of resources to fix it all up neat and tidy, their antics do seem a bit pathetic, however.

Windows

"Rare conditions"

I seem to be hitting "rare conditions" daily as far as Microsoft software is concerned...

No wonder MS themselves are hitting them once a fortnight or so.

How can you do this in MS DNS services?

Or are they running bind9 on Red Hat Linux?

Anonymous Coward
Anonymous Coward

I worked as an...

...IT Service Management consultant on contract at MS a year or so ago. I quickly realized that their infrastructure management skill levels and practices were abysmal. I told them what they needed to do and got out as soon as decently possible - I didn't want to be associated with such a crowd of no-hopers. It seems nothing has changed.

Updates

"A tool that helps balance network traffic was being updated and the update did not work correctly...

Taste your own medicine MS, So now you know just how bloody frustrating it is when your updates dont work correctly

& did the "helpline" assistant go "ooh, I think you'll have to buy another license for that"?

Go

They couldn't pay us to use their cloud

But funnily enough they do.

Higher ROI with service credits than shares.

At this rate of failures, we're be using them for free.

Anonymous Coward
Anonymous Coward

Huh....epic fail, to be sure, but I have a Hotmail account (foisted upon me against my will by a higher educational institute which shall soon give me a fancy piece of paper that I'll put in a frame and reference on a resume but otherwise never think of again) and never noticed the outage. Then again, I only reluctantly use that account.

Anonymous Coward
Anonymous Coward

to err is human...

...but computers are excellent amplifiers. They wouldn't be the first outfit to fall victim to a self-inflicted DDoS. I think there's must be an axiom about resilient systems in here somewhere.

While the number of single points of failure (SPF) is inversely proportional to the number of redundant features, SPF can only approach (but never reach) a lower limit of 1.

Wasn't the idea of "personal computers"...

To be personal, and NOT connected to a central point (Mainframe) of failure.

FAIL

Every CLOUD has a silver lining............

..................for those wonderful companies who promise everything and deliver everything, INCLUDING outages over which you have little or no control.

Anonymous Coward
Anonymous Coward

So is this the Blue Sky of Death?

Microsoft can now make millions of people scream - ALL AT THE SAME TIME!

This topic is closed for new posts.

Forums

Forgotten password

Opinion

euros_channel_money

Tim Worstall

Time to take a sniff at the coffee, perhaps
joe_tucci_emc_channel

Chris Mellor

Will they have to drag him back like last time?
chain_relationship_channel

Features

cloud_accounting
Playing the SLA long game
channel_teaser_money_top
cloud computing Fight
Applications must work for the cloud to float
Paul Cormier, Red Hat
How a Unix killer crawled from the dot-com bust