GoDaddy Outage: RFC for Dummies

MoïseYesterday was a black day for GoDaddy.com. During a few hours all they hosting services were interrupted. Mail, websites but, worse, all the DNS services were unavailable. The outage was caused by a member of Anonymous as said on Twitter but it’s not yet clear. Personally, who’s behind the attack, I don’t care! Results were the same: millions of websites remained unreachable during hours. Other people started to blame GoDaddy and to exort customers to move to another provider. Do you really think other companies would resist to a massive DDoS attack? I don’t!

Let’s leave this apart and focus on the consequences. Lot of websites were simple not reachable because the hostnames could not be resolved. Wait? When  I connected for the first time to the Internet (and – trust me – I’m here for a while!), everybody told me that this super-network was derived from a military project. The goal was to build an super-strong meshed network being able to resist to almost any attack from the “enemies”. Today, we are in 2012 and millions of sites are affected by a “simple” attack! Is there a problem somewhere?

Are people entitled to complain against GoDaddy for not providing the services they subscribed to? Is moving quickly to an alternate provider the best choice? I don’t think so. My idea is that Internet became today a real media like any other one and people tend to forget the complexity that exists behind nice websites with beautiful interfaces and plenty of features. Internet (read: “the set of all protocols used to build the Internet“) relies on RFC’s (“Request For Comments“). Those documents are memorandum published by the IETF (“Internet Engineering Task Force“) and describe how to build a working Internet. As a developer, manufacturer or designer, those RFC’s must be seen as golden rules for you!

Back to the GoDaddy story! There is a very interesting RFC2182 with the title: “Selection and Operation of Secondary DNS Servers“. If you read it (please do!), you will find best practices to define secondary DNS servers for your domain(s). How many do you require? How to deploy them? Let’s take a simple example: digitalz.org. This domain is hosted by GoDaddy:

$ dig digitalz.org ns

; <<>> DiG 9.8.1-P1 <<>> digitalz.org ns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20319
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;digitalz.org. IN NS

;; ANSWER SECTION:
digitalz.org. 3600 IN NS ns13.domaincontrol.com.
digitalz.org. 3600 IN NS ns14.domaincontrol.com.

;; Query time: 68 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Sep 11 08:30:25 2012
;; MSG SIZE rcvd: 85

But if you look at the two registered nameservers (ns13 & ns14):

$ host ns13.domaincontrol.com.
ns13.domaincontrol.com has address 216.69.185.7
ns13.domaincontrol.com has IPv6 address 2607:f208:206::7
$ host ns14.domaincontrol.com.
ns14.domaincontrol.com has address 208.109.255.7
ns14.domaincontrol.com has IPv6 address 2607:f208:302::7

Both are part of the same backbone belonging to GoDaddy:

NetRange: 216.69.128.0 - 216.69.191.255
CIDR: 216.69.128.0/18
OriginAS:
NetName: GO-DADDY-COM-LLC
NetHandle: NET-216-69-128-0-1
Parent: NET-216-0-0-0-0
NetType: Direct Allocation
RegDate: 2004-05-24
Updated: 2012-02-24
Ref: http://whois.arin.net/rest/net/NET-216-69-128-0-1
NetRange: 208.109.0.0 - 208.109.255.255
CIDR: 208.109.0.0/16
OriginAS:
NetName: GO-DADDY-COM-LLC
NetHandle: NET-208-109-0-0-1
Parent: NET-208-0-0-0-0
NetType: Direct Allocation
RegDate: 2006-04-12
Updated: 2012-02-24
Ref: http://whois.arin.net/rest/net/NET-208-109-0-0-1

Finally, have a look at the BGP routes to access those IP ranges: They are announced via the same path (AS-26496)

BGP routing table entry for 208.109.255.0/24, version 111874851
Paths: (5 available, best #2, table Default-IP-Routing-Table)
Multipath: eBGP iBGP
  Advertised to update-groups:
     3         
  26496
    195.69.144.26 (metric 20) from 195.26.4.255 (195.26.4.255)
      Origin IGP, metric 1000, localpref 100, valid, internal
      Community: 5577:2000 5577:2100 5577:2103 5577:5000 5577:5002
      Originator: 195.26.4.133, Cluster list: 0.0.0.2
  26496
    195.69.144.26 (metric 20) from 195.26.4.254 (195.26.4.254)
      Origin IGP, metric 1000, localpref 100, valid, internal, best
      Community: 5577:2000 5577:2100 5577:2103 5577:5000 5577:5002
      Originator: 195.26.4.133, Cluster list: 0.0.0.1
  46786 26496
    199.59.206.17 from 199.59.206.17 (204.26.60.249)
      Origin IGP, localpref 100, valid, external
      Community: 5577:2000 5577:2100 5577:2150 5577:2199 5577:5000 5577:5001
  46786 26496
    199.59.206.29 from 199.59.206.29 (204.26.60.249)
      Origin IGP, localpref 100, valid, external
      Community: 5577:2000 5577:2100 5577:2150 5577:2199 5577:5000 5577:5001
  3549 26496
    208.178.63.97 from 208.178.63.97 (67.17.80.136)
      Origin IGP, metric 100, localpref 49, valid, external
      Community: 3549:4698 3549:31528 5577:1000 5577:1001 5577:5000 5577:5001

As you can imagine, any issue with this BGP autonomous system would have huge impacts on the services (being multi-homed would not solve all the problems). There are plenty of nightmare stories about BGP issues. In this case, best practices are to use multiple DNS servers spread geographically (ex: one on each continent) and connected to multiple backbones totally independant. In other words: Don’t put all your eggs in the same basket! Always keep in mind that RFC’s are your best friends. Follow and implement them to increase the availability of your online services.

Outages like the one of GoDaddy are always good opportunities to remind best practices. We learn by doing mistakes!

Post Navigation