An Operational Reason for Knowing Trivia
I've been largely out of touch with the IT certification scene lately, but I'm sure that people are still complaining incessantly about the fact that they need to memorize "trivia" in order to pass certification tests. Back when I was teaching Cisco classes full-time, my certification-oriented students were particularly bitter about this. Of course, this is a legitimate debate and the definition of "trivia" varies from person to person.
When I saw this article about CloudFlare's world-wide router meltdown, however, I immediately felt a bit smug about all those hours spent learning and teaching about packet-level trivia. If you don't want to read the article, here's the tl;dr:
In order for this meltdown to happen, they had to have a compounded series of errors:
When I saw this article about CloudFlare's world-wide router meltdown, however, I immediately felt a bit smug about all those hours spent learning and teaching about packet-level trivia. If you don't want to read the article, here's the tl;dr:
- their automated DDoS detection tool detected an attack against a customer using packets sized in the 99,000 byte range.
- their ops staff pushed rules to their routers to drop those packets
- their routers crashed and burned
In order for this meltdown to happen, they had to have a compounded series of errors:
- the attack detection tool was coded to allow detection of packet sizes that can't actually occur: no bounds checking.
- the ops staff didn't retain the "trivia" that they learned in Networking 101, and thus couldn't see the problem with the output generated by the detection tool.
- the router OS didn't do input validation, and blew up when attempting to configure itself to do something crazy.