Tuesday 2 July 2019

Securing Internet Routing: RPKI OV and ROAs

For some time now, I've had a ticket in my queue to "Investigate RPKI". A few weeks ago, we experienced some strange internet outages that turned out to be because not all is well with RPKI Origin Validation at one of our upstream ISPs...

The Internet owes much of its success to the marvel that is BGP based routing. Of course, like so many "early" Internet protocols and standards, it's very open and trusting. Just about anyone can claim to be a route source (hence "BGP Prefix Hijacks" are a thing). This is sort of an "Achilles Heel" in the global Internet.

The Internet community has, therefore, long thought about more secure ways of doing routing, but they move slowly, as BGP is (by necessity) conservative in changes. There are a number of IETF Working Groups working on an overall solution called "Secure Inter Domain Routing" (SIDR, not to be confused with CIDR!).

Part of this is RPKI, and the first really workable section of that is Origin Validation (OV). This takes three parts; cryptographically sound declarations of valid origin AS numbers (ROAs) for specific prefixes; validation servers that routers can query; and then routers acting on that information (OV). The first one is very easy and relatively low risk (indeed NOT creating and publishing a ROA carries a risk if your ISP is doing OV); the other two, considerably less so.

RPKI and ROAs


Route Origin Authorizations (ROAs) basically establish a chain of trust (similar to a TLS certificate CA) from the Regional Internet Registries (RIRs) to cryptographically signed declarations of what Autonomous Systems (ASs) ought to be originating a prefix (advertising reachability as the last step in the routing pathway [AS-PATH]). This makes sense, because the RIRs are the organisations that "dish out" both IP address resources and AS numbers, so they know which organisations legitimately should originate which prefixes.

Clever people thought that it would not be a good idea to have routers try to solve Internet scale problems. They have created a number of servers (software diversity can be a healthy thing) that essentially parse the ROA feeds of the various RIRs (and some other sources) to determine what ASs can validly originate a prefix. There are "public" servers of this sort you can use, but they're really something you want to have a high degree of trust in - so running your own, or relying only on those operators you *really* trust is wise (if your routers do Origin Validation). Relying Party servers are available from several places, including: rpki-validatorRoutinator; rpki.netRTRLib; rpki-client(8); Cloudflare have a service and validator; there are others if you hunt around.

Routers can then use a fairly simple protocol (RPKI-RTR) to query Relying Party Servers (aka Validator Caching Servers) to ascertain whether or not a particular prefix is Valid, Invalid or Unknown, and act accordingly - aka Origin Validation (OV).

ROA decisions of caching servers come in three flavours - "Valid", "Invalid" and "Unknown". Most prefixes are "Unknown" - they don't have a ROA covering them. "Valid" denotes that a particular prefix has an origin AS that is covered by a legitimate ROA. "Invalid" means it's not covered by a valid ROA - something about the prefix is off (either it's the wrong size, or the origin AS is not in the ROA for the prefix). Basically, you're ascertaining whether the organisation that uses AS number XYZ is legitimately the "owner" of a particular prefix (X.Y.Z.Q/NN) (a routable IP address subnet, usually expressed in CIDR notation).

ROAs, it turns out, are pretty painless things to deploy (visit your RIR's site with a valid organisational login, and issue away after carefully considering your prefixes and what the adequate ROAs will be). They won't solve all possible types of BGP hijack (particularly not some kinds of intentional attempts), but they should certainly cut down on "accidental" route origination. And one other problem goes away too...

The Big Surprise

So pretty much anyone who regularly uses BGP in their day job (particularly in service provider networks) has memorised the BGP Bestpath selection algorithm for their particular brand(s) of router (e.g. Cisco, Juniper, other vendors). People spend a LOT of time and effort tuning various BGP parameters to carefully control how their routes are treated (and how they treat their customer's routes, in the case of ISPs) - carefully making use of that knowledge to achieve their desired routing policy (or even as basic a thing as a working Internet connection - or for fun/profit).

Newsflash: Valid ROA status supersedes ALL other Bestpath selection processes - if there is a single Valid route, and all the others are Unknown or Invalid, the Valid one will be picked (even if it otherwise has a fairly low priority).

Here's what Cisco (the vendor of the affected upstream ISP) has to say about Origin AS Validation:
"By default, a prefix marked as Not Found is installed in the BGP routing table and will only be flagged as a bestpath or considered as a candidate for multipath if there is no Valid alternative (independently of other BGP attributes such as Local Preference or ASPATH)." p2, Cisco BGP—Origin AS Validation document - emphasis mine.
Furthermore:
"During BGP best path selection, the default behavior, if neither of the above options is configured, is that the system will prefer prefixes in the following order:
   - Those with a validation state of valid.
   - Those with a validation state of not found.
   - Those with a validation state of invalid (which, by default, will not be installed in the routing table).
These preferences override metric, local preference, and other choices made during the bestpath computation.
The standard bestpath decision tree applies only if the validation state of the two paths is the same." p4, (both emphases mine), Cisco BGP—Origin AS Validation document
This means that a single spurious "valid" route can wreak merry hell on your routing.

Amazingly, there doesn't seem to be a "knob" to disable this behaviour for specific prefixes (like, say, your customers) - you can override what happens to Invalid, but not Unknown / Not Found - other than turning it off entirely - which rather defeats the point...

We came across a fairly tragic (repeated) instance where this happened to us - a nonsense path "randomly" became marked as Valid, and that resulted in our traffic getting blackholed in some way (most likely the routing within our ISP was then b0rked, because there wasn't actually a working route to us from that "valid" router, which we were not connected to). Sadly, only one of our ISPs has a looking glass we can use to interrogate their view of the world. When traceroute fails to your ISP, you know you're having a problem...!

Here's are two examples, one of a case I'm calling "Phantom Validity", and one of "Sticky Validity". Whether this is a bug in Cisco routers, a particular glitchy router or a glitch in the Relying Party Validators, I can't say, but I've sent all available info to the ISP for them to digest.

Phantom Validity:
Network          Next Hop            Metric LocPrf Weight Path 
N* ia 146.231.128.0/21 41.78.189.211 0  150   0 3356 174 2018 37520 37520 37520 i 
N* ia                  41.78.189.234 0  150   0 3356 174 2018 37520 37520 37520 i 
N* ia                  41.78.189.226 0  150   0 1299 174 2018 37520 37520 37520 i 
N* ia                  41.78.189.210 0  150   0 2914 174 2018 37520 37520 37520 i 
N* i                   41.78.189.211 0  150   0 2914 174 2018 37520 37520 37520 i 
N* i                   41.78.189.234 0  150   0 2914 174 2018 37520 37520 37520 i 
N* i                   41.78.189.211 0  150   0 1299 174 2018 37520 37520 37520 i 
N* i                   41.78.189.210 0  150   0 1299 174 2018 37520 37520 37520 i 
N* i                   41.78.189.234 0  150   0 6453 174 2018 37520 37520 37520 i 
N* i                   41.78.189.226 0  150   0 3257 174 2018 37520 37520 37520 i 
N* i                   41.78.189.234 0  150   0 6762 174 2018 37520 37520 37520 i 
N* i                   41.78.189.210 0  150   0 174 2018 37520 37520 37520 i 
N* i                   41.78.189.226 0  150   0 174 2018 37520 37520 37520 i 
V*>                    196.60.8.216  0  250   0 2018 37520 37520 37520 i 
N* ia                  41.78.189.188 0  250   0 2018 37520 37520 37520 i 
N*bia                  41.78.189.156 0  250   0 2018 37520 37520 37520 i 
N* ia                  41.78.189.116 0  250   0 2018 37520 37520 37520 i 
N* ia                  41.78.189.117 0  250   0 2018 37520 37520 37520 i 
N* ia                  41.78.189.52  0  250   0 2018 37520 37520 37520 i
Above V is not correctly marked as "V"alid - it's "phantom" valid, and in this state, all hell breaks loose (All should be N). We saw this happen on a prior occasion to another of their routers (different IP) - and probably several other weird routing outages like this due to this glitch (we just didn't have the looking glass's view for those outages, because we spent some time thinking it was a problem with our firewalls or internal routing gone squiffy).

Here's a possibly relating thing - spurious validation of a prefix with no covering ROA (expired several days previously) - all should be "N", NOT "V". - perhaps once a route is "valid" it's "sticky"? (We reset BGP connections to both upstream ISPs and that did not help). Their validator cache correctly said that this wasn't covered by a ROA - so I think "Cisco might have a problem".

Sticky Validity:
Network          Next Hop            Metric LocPrf Weight Path 
V* ia 192.42.99.0      41.78.189.226 0  150   0 1299 174 2018 37520 37520 37520 i 
V* ia                  41.78.189.234 0  150   0 3356 174 2018 37520 37520 37520 i 
V* ia                  41.78.189.211 0  150   0 3356 174 2018 37520 37520 37520 i 
V* i                   41.78.189.234 0  150   0 6762 174 2018 37520 37520 37520 i 
V* i                   41.78.189.226 0  150   0 3257 174 2018 37520 37520 37520 i 
V* i                   41.78.189.234 0  150   0 6453 174 2018 37520 37520 37520 i 
V* i                   41.78.189.234 0  150   0 2914 174 2018 37520 37520 37520 i 
V* ia                  41.78.189.210 0  150   0 2914 174 2018 37520 37520 37520 i 
V* i                   41.78.189.211 0  150   0 2914 174 2018 37520 37520 37520 i 
V* i                   41.78.189.210 0  150   0 1299 174 2018 37520 37520 37520 i 
V* i                   41.78.189.211 0  150   0 1299 174 2018 37520 37520 37520 i 
V* i                   41.78.189.210 0  150   0 174 2018 37520 37520 37520 i 
V* i                   41.78.189.226 0  150   0 174 2018 37520 37520 37520 i 
V* ia                  41.78.189.117 0  250   0 2018 37520 37520 37520 i 
V* ia                  41.78.189.116 0  250   0 2018 37520 37520 37520 i 
V* ia                  41.78.189.52  0  250   0 2018 37520 37520 37520 i 
V*bia                  41.78.189.156 0  250   0 2018 37520 37520 37520 i 
V* ia                  41.78.189.188 0  250   0 2018 37520 37520 37520 i 
V*>                    196.60.8.216  0  250   0 2018 37520 37520 37520 i
We think we're amongst the first people to see glitches like this - I had a quick google and couldn't find much about it - but it's there waiting to bite people.

Apparently, the affected ISP is the "first in Africa" to deploy RPKI Origin Validation. I've raised the issue with them, so hopefully it's something their NOC now know to look for if their customers have weird routing glitches - we've deployed ROAs for all our prefixes now, so we don't care about either of these errors any more!
Hopefully they in turn will open a ticket with their router vendor to figure it out.

Glitchiness as a "feature"...

A few days back (after writing the above), I asked for some feedback about the odd glitches with RPKI we'd seen from the ISP.

They were kind enough to reply with the following illuminating information (slightly edited):
There isn't so much a document for this 'bug' It's more of their implementation results in undesirable effects in our setup.
There are 2 main issues:
1) Cisco routers will prefer routes with a status of “Valid” to those with a status of “Not Found”. This behaviour does not appear to be configurable. (In our case we want to treat these routes equally and only drop "Invalids")
2) When validation state checking is enabled by establishing an RPKI-RTR session with a cache, only those routes learnt from an eBGP neighbour are evaluated, with all iBGP-learnt routes automatically marked “Valid”
The implication of this is that as a route is received over an eBGP session on an ASBR, the route is correctly marked with a status "Not Found". However once this route is transmitted over iBGP to the rest of the network, they are then marked as "Valid" the the recipient iBGP speakers and since the routers will prefer "Valid" over "Not found", the undesirable routing path is selected.
So that pretty much mirrors what we saw. 1 is to be expected (and is, once you dig in, documented as noted above), but 2 is a little surprising, and seems to be what caused our traffic to die a horrible death, and is, as far as I can see, responsible for "phantom validity" as discussed above.

The Big Fix

All you have to do is publish a ROA for your prefixes. Simple!

Obviously this assumes a) you have a BGP AS and b) you have a provider independent allocation of IPv4 and/or IPv6 address space. You simply use the tools your RIR (AFRINIC, RIPE NCC, LACNIC, APNIC, ARIN). It's very straight forward (once you jump through any RIR hoops), and I can't see that it has any downsides, so long as the ROAs you publish actually reflect the way you announce your address space. Obviously, if you don't do BGP or "own" a PI resource, there's not much you can do about it, but your ISP may be able to help. Typically, by the time you're doing BGP, you've got PI space and (at least) two upstream providers, so your RIR will be where you'll go to set up ROAs.

There are numerous guides out there that will tell you a lot about these, but my suggestion is:
  1. Create one or more ROAs (you might want to manage different address families, or different blocks of IP space, independently) - but ensure that you keep the largest prefix in the same object as the smaller ones. You can have multiple address families and prefixes in one ROA, too. 
    1. i.e. if you have a /16, publish a ROA for the whole /16, plus within the same ROA, any smaller prefixes you will actually advertise (rather that using maxlength /24 to allow you to dis-aggregate whenever, which is not always clever). 
  2. Match your ROAs to the route and/or route6 objects you're publishing in your IRR records - start by getting your IRR records right, and then replicate that same policy in your ROAs. 
  3. Make sure you keep both up to date with the routes you're actually exporting. 
  4. Put the expiry date of your ROAs into your calendar, and renew them before they expire! 
If you have a crazy fast need for automated changes (like maybe you're a massive global enterprise, and you need to have a really flexible routing policy - like perhaps a massive cloud hyperscale company, or transit provider) there are ways of delegating a RPKI CA-like function to your organisation so you can completely control the ROA generation process in house (but you'll probably need $$$$$ cryptographic HSMs to do it right) - RPKI Delegation. There are even existing tools to help you do this, like Krill.

Once you've jumped through your RIR's RPKI ROA hoops, this will mean that your intentionally announced prefixes will be covered by real, valid ROAs, so the normal bestpath selection algorithms will "win", and you'll get the desired results. They propagate surprisingly quickly; several sites will let you check the status of RPKI records for specified prefixes.

RPKI ROAs are a really "low harm" (and to be honest, really low effort) thing to deploy (which is good, because it's going to mean it's much easier and faster to get this done at global scale, for the good of all).

With the interesting effects noted above, as ISPs and IXPs increasingly to RPKI OV, you're going to have a better time if you do have a ROA than if you don't! AKA do it sooner rather than later.

What's not fixed in this shiny new ROA world? 

So RPKI Route Origin Validation does not solve all of our problems (on a global basis).

The main thing that isn't really totally solved are malicious hijacks - people who intentionally try to grab your traffic.

However, some basic AS PATH sanity checking and ROA validation certainly should help stop "accidental" route hijacks and leaks. Last month, a glich cause chaos in Europe, and there were prior cases where Pakistan blackholed YouTube, and an ongoing list of these things, and not long ago, the infamous case where Verizon didn't filter nonsense from clients and broke Cloudflare (amongst other things). The latter would probably have been mitigated, so long as the leaked prefixes were smaller than those covered by ROAs. A lot of this is really up to the "edge" of highly connected networks (i.e. ISPs need to filter their customers). It's WAY harder to do this in the DFZ - but anyone who sells access to this ought to filter their customers...!

One critical problem it doesn't address is whether a router that claims to be AS XYZ is actually a router belonging to AS XYZ! I could, for instance, run a router at some random spot on the Internet, and (if my peers are not all that fussy) simply declare it to be a member of AS XYZ and start originating routes - or more sneakily, pretend to be your ISP and fake origin routes from you (a related problem to direct origin AS spoofing). If I match the origin AS to the prefixes you actually use, then these routes may even be marked as RPKI Valid!

So, what we need next is some way of proving an origin router is actually legitimately a member of that AS. This is a Hard problem, because routers don't really have any good way of figuring out if some IP address waaaaay on the other side of the world is a valid member of a particular AS, and that the AS PATH hasn't simply been spoofed to look "right".
Normally, you'd end up having to have a "handshake" between both endpoints (and arguably, every router in between) to prove routers are what they claim to be - this could be incredibly taxing to the global routing infrastructure (and it's not a feature that exists). It's way more work if literally every router on the Internet might come up to you and do the equivalent of a TLS handshake - which is the real "death knell" to this (there are other complications too, of course). So the "obvious" routes don't scale to Internet size (let alone the potential DDOS abuse factor). Which is, presumably, why we've not made much progress here.

An alternative might be something like BGP Path Validation. IRR based filters can, to some degree, already help with this (you can publish a policy in WHOIS of what valid BGP paths should look like with as-path). Obviously, a determined attacker could spoof that, too, and they merely need to spoof a long enough AS-PATH before their own AS before they can get your traffic... Of course, longer AS Paths are less desirable! Of course, this doesn't scale well. Is a path  AS4 AS3 AS2 AS1 valid? You're not a customer of AS 3 or 4, who are massive "tier one" ISPs. The only really definitive statements you can make are you AS and immediately upstream ASs (i.e. your direct Peers).

It is likely that the only way we're going to "fix" these kinds of problems is a lot of work by ISPs, IXPs and mega-transit providers in sanity checking the routes they're learning from peers of various sorts. For IXPs and ISPsMANRS is a good start to this. Obviously, this requires more cooperation, but leverages existing things (contractual or trust relationships) that already exist.

One final "problem" (conspiracy theory?) is that because the RIRs control the TALs, theoretically, the jurisdiction in which the RIR operates can "seize" anyone's resources (i.e. by messing with the ROA certificates). I suspect this is A) highly unlikely and B) the "ignore invalid" knob paired with import filters on prefixes hijacked by rogue RIRs (or their host country) will work around it in minutes/hours as news spreads that this has happened - not a valid concern IMO - if they break RPKI that badly, operators will stop using it.

What else can you do? 

Whilst you're thinking about securing your routing, what other steps can you take?

Sane export policy - the most important step

As an "end user" network, you can also help to ensure that you don't inadvertently leak absolute nonsense onto the Internet with suitably strict export filters. Whilst you're checking your export policies, make sure they're as compact as possible (don't announce smaller prefixes if you don't have to - the larger and more contiguous your prefixes, the smaller you help keep the global routing table) - i.e. you don't actually have separate routing policies in place for parts of your /16, advertise as a single /16, not 256 /24s - and don't leak what is your IGP into your EGP - aggregate your routes! Make sure you can't announce things like bogons or prefixes you don't own. There is plenty written elsewhere about good export policies, and if you're exposing the world to your BGP configuration, you owe the Internet a few hours (at least) of learning on the topic to make sure you're not b0rking it.

Just because your ISP claims to filter your prefixes doesn't mean they'll get it right (everyone has bad days) - and the more checks and balances there are between network engineers having a bad day (or router software glitches) and the global internet routing table, the better. Spend the time making your BGP export policy configuration robust and "fail safe".

If you're not already using an IRR, publish route / route6 objects

Use your RIR's tools to publish route and/or route6 objects; other organisations can use these to build better filters. Some ISPs (particularly those doing MANRS in earnest) will insist these exist before they'll accept your routes. Ensure you understand what you're doing before you mess around with this!  This post is long enough without going into IRR records in detail... :)

Don't export itty bitty routes for no reason

If you're announcing a prefix that is Provider Aggregate (PA) space from your ISP back to them (to establish a BGP route for that prefix, for instance), add the NO-EXPORT BGP community attribute to that/those prefix(es); obviously, don't do this for Provider Independent (PI) space, because you probably do want them to export that for you...
You can assume they've got a covering prefix for their own space that will result in the Internet maintaining reachability to you - just without the nonsense of some small part of their overall allocation polluting global routing tables for no good reason. So if they've allocated 10.2.3.0/24 to you, and they announce 10.0.0.0/8, you're good to no-export this prefix on your edge router(s).
Rule of thumb: If a prefix is only announced by one ISP, and their routing policy is no different to yours, and it's PA space, make sure their routers don't announce it any further by using a no-export community on that prefix when you export it.

If you have PI space, announce all your prefixes to all your Peers/ upstream ISPs (unless you have a compelling reason not to), but be as compact as possible (the only reason to split up a large block is because you have different routing policy for some smaller blocks for e.g. traffic engineering purposes, or restrictions on particular kinds of traffic over some ISP links, etc).

We've recently started doing just this, and the prefixes the global Internet sees from us no longer contain PA space (as it should be!) - we've gone from announcing 10 IPv4 prefixes to 4; and from 5 IPv6  to 2. That's 6 fewer routes in the global IPv4 table, and 3 fewer for IPv6. Almost all of that was though adding no-export to PA space we're using (and withdrawing announcing one /21 prefix we no longer have separate routing policy for).

You can get this really wrong...
Here's how NOT to advertise routes when you're single homed...!
That's *55* routes, which could be summarized as just 4 - with no difference to the organisation.
I've obscured the origin AS Number, but it's another University in South Africa. It looks like they've "leaked" their IGP into BGP. Data from bgp.he.net.
There are, absolutely, valid reasons for splitting up your address allocation (usually for traffic engineering, or because the routing policy between different prefixes differs), but for most organisations, most of the time, what you want to route is your entire prefix; don't clutter the global routing table unnecessarily (particularly if you have a honkin' great big chunk of Legacy space). It kind of irks me that because of binary trees, I need to advertise 3 prefixes from a /16 when I actually want to only advertise 2, but there's a limit to what CIDR can do in aggregating on bit boundaries (we use the top 1/4 of the space for different purposes than the bottom 3/4; subnet maths means that's most compactly a /17 and two /18s)


Insist on MANRS and RPKI OV

You can also ask your ISPs or IXPs if they're doing anything about MANRS or RPKI OV - and even build that into your purchasing decisions (put it into your RFPs as a bid specification item!).

Make sure your ISP(s) have a BGP Looking Glass...! As someone who runs a network using BGP, it is useful to know a couple of public ones on top of any your ISP may run (ISPs are useful to figure out what they're doing with your prefixes; others are useful for views of what the global Internet is doing with them).

A good sign is when your ISP publicly lists the BGP communities they make use of (which are quite handy in BGP policy, both on import (which routes you accept and use) and export (which prefixes you advertise and how).

At the end of the day, ROA helps you against accidental (or unsophisticated intentional) prefix hijacks by 3rd parties. However, protection against intentional spoofing will require much more work by ISPs, IXPs and global transit companies ("Tier One ISPs"). The more ISPs and IXPs are strict about MANRS and RPKI OV, the more effective they'll get. At some stage, it will reach a tipping point, and NOT deploying these things will be a problem.

Monitor your prefixes on the global Internet

You can subscribe to services that will alert you to BGP routing changes, like BGPMon to help you see any "funny business" that might happen to your prefixes - importantly, this is an "internet scale" view.

Do your own prefix validation

If you want to filter prefixes for validity (say, for example, your upstream ISP doesn't), you can get your edge routers to do so. Find a reliable 3rd party validator you trust, or run several of your own. There are good guides available online; Juniper, for instance, has a "Day One" guide that steps you through everything you need to know.

Some other quick wins


  • If you peer at INXs, make use of peeringdb
  • If you're really big, and don't want to look like a tit (hello, Verizon), have a responsive 24/7/365 NOC, and consider INOC-DBA
  • If you're quite small, and don't run an 24/7/365 IT support operation, make sure your upstream ISPs DO have 24/7/365 NOCs - and that they have contact details for your network engineering team; answer the call if it happens at 3am...! If you're doing BGP, it's serious business.

Finally... 

It's *very* hard to secure the "core" of the Internet, but it's quite easy if you start at the "edge". Secure your edge, and help to get ISPs and IXPs to secure theirs (and tell your friends to get on the same bandwagon).

It's really, really helpful if your ISP has a responsive support NOC (the ISP mentioned here does, and we like that a lot; of course, you get what you pay for, and there are a lot more zeroes involved in monthly invoices than with a typical SOHO ISP!); you can get a lot of investigation done if they have a Looking Glass (i.e. you can probably tell them what's broken to save them time!).

More reading


2 comments:

  1. Shouldn't this "phantom validity" problem be fixed by the "Announce RPKI Validation State to Neighbors" feature as explained in the document?

    Either this ISP did not configure that (so everything from iBGP was valid, as opposed to the actual state), or the feature is buggy.

    ReplyDelete
  2. Possibly. It depends if what is triggering some of these problems is "bug" or "feature". From later correspondence with the ISP (yesterday in fact), it sounds like they have not enabled "announce rpki validation state to neighbours" per https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/configuration/xe-3s/irg-xe-3s-book/bgp-origin-as-validation.html#GUID-26F1E42B-7A1E-4AFB-A351-0D5238ECA01A

    Interestingly, doing some more googling this morning, I managed to find a post from 2015 in a local NOG mailing list with people from this ISP (and others) discussing bugs in Cisco concerning RPKI validation - so buggy seems to have a long life-time.

    Thanks for the comment, Lukas. :)

    ReplyDelete