21st Century Networks, resilient networks, self-healing networks, global re-routing, circular networks, mesh networks,...
That was the dream...
A raft of BT customers in the UK were knocked offline this morning due to a power problem at one of its web peering partners’ sites in London. According to outage monitoring website Down Detector, customers reported a spike in problems at around 9am today. One customer got in touch to report that all their BT Infinity …
Yes, that was the original idea. The problem we have now is that the global internet is now many, many orders of magnitude larger than was originally foreseen (don't forget - the original "internet" only had 20 or so nodes on it) which makes operating it a whole lot harder. For example the original routing protocols have long been pretty much abandoned (except in small networks where they work just fine) and more complex protocols have been developed and deployed. However these protocols can take longer to converge to an answer if there is a massive reconfiguration of the internet (such as a major node failure). Added to this is that no protocol can compensate for stupid network design (e.g. running primary & backup cables through the same duct, having both primary and backup systems on the same power supply, etc).
Apparently BT are fixing it at the moment with their help desk.
So Far They've:
1) Disconnected all equipment from other sockets (this is what took down the BBC website)
2) Plugged their router into the master socket (this took down their trunks and backbone)
3) Rebooted their router
4) Taken delivery of another router and tried that
So they're now awaiting an engineer visit. Should be fixed by next Friday.
"So they're now awaiting an engineer visit. Should be fixed by next Friday."
However, the engineer will not show up but they will subsequently be told that they will be charged if the engineer is to return and that there is no availability of an engineering appointment for another couple of weeks.
It's been like this since about 7am this morning at my workplace. Currently at 10:30 it's still intermittent. Some websites will load without problems and others will take forever to load.
Which is annoying as I want to order some Hammerite paint from Amazon for my car.
AC because I shouldn't be shopping at work.
Had just the same, except in my case it's a hard drive caddy I'm after.
Just managed to get to Amazon though (11am) and ordered it, so give it a try..
Also had the irony of getting an RSS feed message from the BBC News about their article on the subject, but not being able to open it due to the issue...
We use SIP with BTNet. As far as I can tell, it hasn't been affected, whereas we are having problems accessing web sites and with remote users trying to access servers on-site.
That said, it hasn't helped me at all to try and get through to a live person at BTNet. Being told to call back later is not at all impressive.
"At the moment, BT would just appeal to the EU and it would be overthrown as there's a similar problem in Germany with Deutsche Telecom."
And if OR were to be split off how long do you think it would be before it was bought by Deutsche Telecom, or Telefonica - or maybe SoftBank?
Millions, if not indeed billions, are spent on (advertising) network resilience yet still server centres and other installations fall over, go "off grid", suffer "outages" or "unplanned downtime".
Is it simply impossible to prevent these occurrences? Is all the advertising about resilience etc complete dishonest bollocks?
Or are the PoP operators just lying to us on the grounds that it is so much cheaper to be a crook than try and actually build in genuine resilience?
And what about all these certificates they display so proudly on their websites? Are these all lies as well? Are the awarding bodies just in on the scam and taking the dosh while they can? Shouldn't an operator suffering one of these unexpected "inconveniences" lose their accreditation? And what about some com-pen-pay-shun?
I don't know particulars of BT, but you hit some good points there.
"Millions, if not indeed billions, are spent on (advertising) network resilience yet still server centres and other installations fall over, go "off grid", suffer "outages" or "unplanned downtime"." Indeed. Advertising brings in revenue. Infrastructure is just an expense. It's not uncommon to increase spending on the services (like advertising) while cutting expenses on the infrastructure that supports those revenue services. Years ago at a small chain retailer, the manager explained to me that because we were all paid on commission, "we polish the displays but nobody fixes the roof."
"Is it simply impossible to prevent these occurrences?" Not impossible, but it requires awareness and also decision-makers must be rewarded for solid planning over short-term results. "Is all the advertising about resilience etc complete dishonest bollocks?" Not exactly. I've seen very resilient designs get crippled by small decisions like using the redundant link to handle load spikes instead of renting a metered link. As so often in this world, people prefer data that supports their message and may not even be aware of how the facts have changed.
"And what about all these certificates they display so proudly on their websites? Are these all lies as well?" Yeah, sometimes. :) The certificates have very specific definitions. "Certified Malware Free" is much easier than "Scanned Every Hour According To OpSec 15(a) Which Is Has Been Due For Review For Two Years And Meanwhile We Changed Vendors And Our Tech Lead Left To Join A Startup So Nobody Really Understands It Any More But It Seems To Work Fine And We Are In Compliance With Our Accreditation." Again, not unique to IT. We probably all know someone who bought a very expensive car and then "saved money" by deferring maintenance. Or bought insurance but neglected to raise the limit after some major purchase.
Okay, you nailed the big ones. I just spent too much time in Operations!
Power Outage you say?
So, let me get this straight, Harbour Exchange in London's Docklands, one of THE most important links in the whole UK network, does not have adequate provision for when there is a power outage, either internally or externally.
Effectively no resilience then?
GOD'S TEETH!!!!!!
Glad to see it's not just me who sees the red mist with this sort of wording. It doesn't matter if it is a small number of customers, for those customers it is a complete loss of service.
PlusNet has been having occasional lie-downs in a dark room with a damp cloth over its forehead for a week or so now. Again, the mysterious power issue might be causing some problems for a small number of customers (which always seems to include me). The falling over seems to have started round about the time they told me my bill would be increasing to pay 22 tattooed millionaires to kick a ball around on a TV channel I don't watch.
If you're seeing an unstable connection via Plusnet you may be suffering the same thing I've had. It seems IP addresses starting with 51. are seeing packet loss. Go to their addons in your account and add a static IP. They'll add it to your bill but I'm planning on arguing for a refund as they've set a precident in their forums by giving it to someone else for free for the same issue. My IP switched to an 81 and my problems (seem) to have gone away.
Compared to, say, the number of particles in the known Universe, it was small number of customers affected.
Perhaps the regulators should force companies to release an estimate of the percentage of customers affected together with any qualifying factors such as geographical limitations on the disruption.
This post has been deleted by its author
If you choose BT you don't deserve good service. And I think you know that.
The simple test is: Does your ISP adverse on TV? If they're shit and need to advertise on TV. (They also spent all the money for expanding capacity on advertising).
All the best things in this world don't need advertising because people just know they're awesome.