No uninterruptible power supply?
UK flights CRIPPLED by system outage that shut ALL London airspace
All London airspace was closed to incoming and departing traffic for just under an hour on Friday afternoon due to a computer outage at the National Air Traffic Services – Blighty's air traffic control authority. According to the European Organisation for the Safety of Air Navigation, a machine failure resulted in all airspace …
COMMENTS
-
-
-
Friday 12th December 2014 16:07 GMT smudge
http://www.allen-diesels.com/case-studies/civil-aviation-authority.php
Two 16kV, 5MW diesel engines, it seems.
Note that this is not a comment on Allen Diesels, nor am I suggesting that they are in any way at fault. I merely came across that page when searching around t'interweb to see what sort of backup power supply Swanwick has.
-
-
Friday 12th December 2014 16:26 GMT smudge
There was a data centre (in Denmark or the Netherlands, I think), where the power supply failed.
The diesel generator came up as it should, and powered everything for a couple of days, by which time it needed refuelling.
The diesel tanker duly arrived - and crashed into the outbuilding, completely trashing the generator inside.
-
Friday 12th December 2014 16:35 GMT Xpositor
Quite a few years ago when I worked for a bank that was subsequently taken over by a Spanish bank, we had a UPS test one weekend. Everything worked flawlessly. Unfortunately, mid-way through the day on the Monday, all power in the building failed, knocking out the mainframes. Some herbert had forgotten to switch back over to mains power, and the generator duly ran out of diesel.
-
Friday 12th December 2014 17:10 GMT Phil O'Sophical
Adding to the diesel stories, I know of a site that had diesel backup which was regularly tested. Once a month, for many years, the generator was started, fuel levels verified, etc., Unfortunately the diesel was run for about 2 minutes each time, which had the same effect as lots of short journeys in a car. When the power failed years later the generator started and took over the load flawlessly...until it warmed up. Once hot the thoroughly coked-up engine misfired, wouldn't keep speed, and spluttered to a halt. It took a head-off engine rebuild to get it back online.
-
-
-
Friday 12th December 2014 19:39 GMT HelpfulJohn
"The Dish".
A brilliant film. Very moving in places and full of understated Ozz/UK style humour. The Oz politicians are wonderfully inept, witless and charming.
The American from NASA is so USAlien he's just barely credible but he is a perfect foil for the more relaxed Ozlanders, also a great straight man.
http://en.wikipedia.org/wiki/The_Dish
And the radio telescope's dish is still in the middle of a sheep meadow.
-
-
-
Friday 12th December 2014 16:31 GMT Anonymous Coward
@smudge
Reading the page you linked to, it is the most unfortunate bit of trumpet blowing imaginable, since it is telling us that "nothing can go wrong".
My own UPS horror story? Power cuts out briefly, battery UPS comes on line, there is a thump and a roar as the Diesel generators cut in. Then after a minute white smoke is noticed emerging from the stack by the Diesel shed, the lights flicker and there is an enormous bang. Lights go out.
The maintenance department, the last time the standby generators were serviced, cleaned out the old oil and omitted to replace it. Subsequent inspections had apparently failed to notice that absence of oil level.
Very soon after, the post of facilities manager was vacant.
-
Friday 12th December 2014 17:06 GMT 142
@ arnaut
A similar thing happened, at a massive scale, at the Dublin Amazon AWS data centre a few years back http://www.siliconrepublic.com/enterprise/item/23084-mystery-surrounds-outage-at
They blamed a lightning strike initially but it appears to have been poorly configured failover gear.
-
Friday 12th December 2014 18:53 GMT Dave Henderson 1
Re: @smudge
Bit of a corner-cut stand-by genset if it didn't have the basics of a low oil / low pressure cut-out before it went clunk. Even 30 years ago a decent panel wouldn't have allowed it to run up and hold without pressure being present. Then again, I used to see the most stupid cost-cutting on vital things that might only be needed once in a blue moon - then the kind of thing above would happen.
-
-
-
-
-
Friday 12th December 2014 16:12 GMT Anonymous Coward
Re: It would appear to be worse...
Redundancy means nothing - I work in engineering, and no matter how you plan, something always goes tits-up from little incidents.
Have a read here:
Logging in on my Slack notebook today was:
Linicks Law: If Murphy's law can fail, it will.
*OK wiki crap, but it reflects the paper I read on redundant redundancy once.
-
Friday 12th December 2014 16:25 GMT Bartholomew
Re: It would appear to be worse...
I've seen where they have slowly added more load, over years, than the system was designed to handle. The zero crossover switches, switch flawlessly to they synchronised backup generators and, then fail because they are not rated to carry the larger currents. Then you need to call the sparks in to jury rig a temporary workaround solution, which takes time.
-
-
-
Friday 12th December 2014 16:55 GMT Hans Neeson-Bumpsadese
Generator testing
I remember one site where they tested the backup generator religiously for a few minutes every week. When a major power outage did occur and they needed to run of the genny, it failed within the hour...all that testing had run the fuel down, and nobody had thought about topping it up.
-
-
-
Friday 12th December 2014 16:41 GMT Anonymous Coward
Big UPS
Each of the backup generators is rated for 5MW. And what is the holdup life on your home PC UPS? Typically they hold up long enough for a controlled shutdown.
Let's be optimistic and specify a 4 hour hold up time. At 5MW that's 20 000kWH. For comparison, a typical 110AH leisure battery rated at 12V can supply C/10 for about 4 hours before suffering serious damage. That's 44AH * 12V = approx 500WH.
20 000kWH = 40 000 standard leisure batteries. That's nearly half a million times more than the 10AH unit in a typical PC UPS.
tl;dr: storing large amounts of electrical power is very expensive, that's why we still drive fossil fuel cars.
-
Friday 12th December 2014 16:49 GMT I ain't Spartacus
Jim Willsher,
What's the battery life on your UPS, to keep your PC alive? Oh, plus the phones, radios, lights, power to the other hundred PCs, links to all the radar data you need for ATC, links to airports etc? That probably takes a tad more battery than just to give 1000W at 240V to a single desktop.
Presumably the UPS has to keep everything up long enough for the generator to fire up to keep providing power. That's assuming something hasn't gone wrong with the internal power wiring, in which case there's external power coming in, it just can't be distributed (and neither can the power from the genny).
-
-
Friday 12th December 2014 16:29 GMT Inventor of the Marmite Laser
At least some stuff seems to be arriving/leadving Luton:
http://www.london-luton.co.uk/en/flights/
Interestingly NATS says:
NATS can confirm that a technical problem has been reported at Swanwick air traffic control centre.
UK airspace has not been closed, but airspace capacity has been restricted in order to manage the situation. We apologise for any delays and our incident response team has been mobilised.
Every possible action is being taken to assist in resolving the situation and to confirm the details.
Further information will be released as it becomes available.
-
Friday 12th December 2014 16:09 GMT David Pollard
If memory serves
Wasn't there a problem with the power supply a few years ago? As I dimly recall, there was a component like a smoothing capacitor in the common feed and it was this that failed. The power supply went down and the UPS came up and was connected to a short; or something along those lines. Everything worked perfectly apart from an unlikely fault that no one had foreseen.
-
Friday 12th December 2014 16:17 GMT Anonymous Custard
12 more sleeps 'til Santa?
Well all I can say is they've got 12 days to fix it, or my kids won't be happy if Rudolph and co don't get clearance to land.
That said one of my kids won't be happy anyway, as the storm and power cut we had last night took out the micro-SD card in my Pi, so about 3 weeks worth of her Minecraft tweaking (since the last back-up) looks like it maybe a gonner.
-
Friday 12th December 2014 18:07 GMT Salts
Re: 12 more sleeps 'til Santa?
@Anonymous Custard
I had the same problem last week, it was the SD Card, pulled the data off with ApplePi-Baker(OS X) dd would work also, then wrote it back to another SD card and overwrote the boot files from the raspbian image and all was ok, it does not take long and it is worth the effort.
-
-
Friday 12th December 2014 16:26 GMT Anonymous Coward
Interesting watching in real time
At these times flightradar24,com makes for interesting browsing. Looks like the 14:15 LHR-JFK has just taken off. A flight from Oslo to LHR made two huge circles in the north sea off Lowestoft (you can see the track if you click on a plane) while two Falcon 900s are at high altitude (40k feet) approaching the Thames. Guesses as to what they're up to on the back of a black helicopter...
-
-
Friday 12th December 2014 16:53 GMT I ain't Spartacus
It was when they turned the Christmas tree lights on. There was a loud bang. Everything was plunged into darkness, and then that sad whining sound you get as all the computers power down at once - and you know that it's going to be a loooooooonng time until everything is working properly...
-
Friday 12th December 2014 19:46 GMT HelpfulJohn
BTDT, the sudden silence as the aircon, PCs and just about everything else apart from the battery-powered exit lamps die is stunning.
First, everyone starts talking far too loudly, then the phones start ...
Then some wag mentions putting ten pence into the meter when everyone knows the minimum is a one pound coin.
-
-
-
Friday 12th December 2014 16:33 GMT YetAnotherPasswordToRemeber
Pointless latest update
What's the relevance of the latest update that NATS use MS Windows on their desktops. From the article the restriction in airspace is as a result of a power failure in their DC, so the update seems to be totally irrelevant to the article, other than to try and associate MS with this outage!
-
Friday 12th December 2014 16:43 GMT Anonymous Coward
Re: Pointless latest update
@YAPtoR - beat me to it. I too wonder why the repeated mention of Microsoft and Windows, when the situation was apparently caused by power failure - ie failure to supply the equipment and the MS software with sufficient electrons to operate.
It might have been better to point a finger or two at whoever was responsible for providing the power infrastructure.
-
-
-
Saturday 13th December 2014 13:32 GMT Bloakey1
Re: " run by Serco, with IT outsourced to Capgemini, Amore Group Attenda, BT and Vodafone"
"They make sure that you attend and share the love. Perhaps they organise the Christmas party?"
They attenda in persona when you no paya da billa. They breaka da kneesa and da fingersa all for your owna gooda.
the Irish Mafa make you an offer you can't understand, this mob make you an offer with extra vowels and optional horse heads in bed.
-
-
-
-
Friday 12th December 2014 19:52 GMT HelpfulJohn
Re: Just wait until
That actually happened where I worked, so we got two feeds from the Grid, one coming in from the left and one from the right and about as independent as they could be. With the oil-gennies and the battery packs it made the power rooms a little complicated. Interesting, though.
I'm sure the JCB wasn't the only reason for the double feed but it did help the case.
-
-
This post has been deleted by its author
-
Friday 12th December 2014 17:59 GMT Gary F
Any critical piece of infrastructure needs to have a site B, so even if other measures fail such as independent feeds from separate substations, UPS, and diesel generators, you should be able to fall back to site B. Obviously some people don't think the control room in question was a critical enough asset to have a site B.
Luckily these days as far as severs go it's easy to move virtual servers from site A to site B within seconds, sometimes without any noticeable downtime. But relating to this article I have no idea if the radar and communications systems can be used from locations further afield. Hopefully lessons have been learnt and more resilience will be added. I'm just concerned if any flights in the air could be at risk if this happened again.
-
Friday 12th December 2014 19:10 GMT pepper
No, you have secondary radar and control systems(part militairy) that can take over in a emergency. Not sure what is going on right now but it might be more then just a power failure.
I remember reading about a software error that caused the reported altitude to be wrong in a piece of software for the ATC, I think that was in San Francisco.
-
Saturday 13th December 2014 10:04 GMT Afernie
I dunno...
"Any critical piece of infrastructure needs to have a site B, so even if other measures fail such as independent feeds from separate substations, UPS, and diesel generators, you should be able to fall back to site B. "
Isla Sorna didn't turn out any better than Isla Nublar, to be honest,
-
-
Friday 12th December 2014 18:19 GMT Jon 9
Generator stories....
If we're on generator stories mine from experience...
The backup generator isn't big enough to power the whole building, so there are essential and non essential supplies. The generator has a day tank with enough fuel for a few hours running and there's a bigger main tank which is below ground. There is a pump that fills the day tank from the main tank when the level drops.....
All works great, except that the pump was connected to a non-essential supply.....
-
Friday 12th December 2014 18:59 GMT Anonymous Coward
Re: Generator stories....
This is why marine systems have a manual pump to fill the day tank as backup for when the electric pump fails. And a honking great siren connected to the level sensor.
There are so many backup generator horror stories on this thread that one would have thought it would be a solved problem by now. But as Chernobyl showed, not enough information is shared about incidents to ensure that obvious fails don't happen. Clearly someone should write a book called Backup Generator f**kups and how to avoid them. Perhaps it will become as popular as How to avoid huge ships on Amazon.
-
Friday 12th December 2014 19:33 GMT Anonymous Coward
Re: Generator stories....
From my work-related stories that I cannot attribute, the big problem often is no body actually testes the whole system by pulling the big red know-ended leaver that disconnects the whole building to see what happens.
Too much risk they say! Then one day the grid goes down and they find the room full of servers stay up, but the A/C goes down as it was not UPS'd, followed in the short term by the server hardware expiring...
-
Friday 12th December 2014 19:59 GMT HelpfulJohn
Re: Generator stories....
Actually ... during a tour given by one middle-boss, as he was explaining that under no circumstances does anyone ever, *ever* *EVER* press The Big Red Knob ... he managed to press it.
Darkness, silence and the many pointing of fingers ensued.
H&S, and basic sanity, required that we kept The Big Red Knob, though. We just added more "be farking careful!" notices, memoes and directives.
If there is a failure mode, it will happen. If a certain failure mode is impossible, it will happen. If a failure mode requires human intervention to happen, *that* will happen.
-
-
-
-
Friday 12th December 2014 18:46 GMT Anonymous Coward
Generator story:Capacity planning
Substation outside datacentre gives up the ghost.
UPS works flawlessly.
Generator comes on-line with no problem.
Electricity supplier comes to do emergency repair to substation - one phase completely burnt out - fair amount for work required...
Slight problem: generator has been sized to power the equipment in the datacentre, but for some reason, someone either forgot, or decided not to include powering the data-centre air-cooling/conditioning in the calculations. Ooops.
So, the loading bay airlock doors are opened wide, and mobile fans brought in to get outside airflow. Engineer monitoring hard-disk temperatures of the big-iron says we can let them go up to a certain temperature, but above that, the warranty no longer applies.
Decision was taken to void warranties - and luckily nothing happened immediately, and thankfully, there wasn't a rash of disk failures afterwards - but it had a lot of people sweating.
-
Friday 12th December 2014 18:47 GMT Anonymous Coward
5 finger pointing game
"..small IT team, with common-or-garden IT outsourced to Serco, Capgemini, Amore Group Attenda, BT and Vodafone"
5 different suppliers for run of the mill stuff... *roll_eyes* you can imagine the calls: The first 15 mins of the conference call will be "xyz has joined the call" followed up with "has abc joined yet" from the late comers... Its amazing they get anything done!
-
Friday 12th December 2014 20:49 GMT Anonymous Coward
I've found the problem!
"Air traffic services are run by a relatively small IT team with knowhow and support from Lockheed Martin. Common-or-garden tech is outsourced to Serco, Capgemini, Amore Group Attenda, BT and Vodafone."
I bet that the real reason half of England's airspace is restricted is because all the circular finger-pointing between the group listed above has created a dangerous cyclonic rotation over the southern UK :)
-
Friday 12th December 2014 22:45 GMT Anonymous Coward
Various power backup stories
Back in the mid 2000's, a data centre outsourcer type company in central London just kept adding more and more servers until they melted their feed back to the local substation!. They had a bunch of generators but it took so long to get that cable replaced that 60 - 70% of customers moved to somewhere else.
Global switch 2 in Docklands were doing some testing in 2008 or 2009 of their much vaunted power capabilites...... and multiple floors lost power, 3 or 4 of them iirc. At the time i was told that they get power into the building from two separate sections of the national grid (130KV straight into their own substations) and each floor has two pdu's supposedly from different rotary ups on the roof from the two different power feeds. How they managed to balls all that up I never heard.
During some expansion work at EMC's manufacturing plant in Cork in Ireland in the early noughties a jcb hit overhead power lines and the entire plant shut down. It was about 3 weeks before end of quarter and the floor was full of Symm 4's and so when the power went the noise of all that kit spinning down was awesome.... it went on and on and on
The last one is from when we were having pdu work done in a smallish datacentre, the electricians did all the pdu work overnight and then at 6am one very tired electrician turned off the other working pdu instead of turning on the repaired pdu.
-
Saturday 13th December 2014 13:23 GMT CCCP
Diesels...
Are a bugger.
Many moons ago I was an airport fire engine truck driver. It is obviously pretty important they don't fail if a 747 is burning after a crash.
So we drove them for one hour every day. Clearly not practical for a UPS scenario, but it does point the way if you really want an oil burner UPS.
Separately, since fuel consumption is of no concern in UPS scenarios, why is petrol not considered? I suspect it's because generators are built for long term fuel efficient use.
-
Saturday 13th December 2014 14:27 GMT -tim
Its progress! right?
I wonder how many ATC systems were written by people who learned Object-oriented programmingfrom Booch books where the common example was an ATC system that only a programmer would ever consider. ATC systems should never have to consider where the plane is and focus on where the plane might be. Otherwise things get odd when there are failures.