Posts Tagged ‘routing

06
Sep
14

Dummies, link local and the loop back

So we had a bit of an incident a few weeks back where an interface got configured with the wrong ipv6 prefix length. the anycast network we have is a /48 however it was addressed with a /32 prefix. This had the obvious effect that the adjacent 63999 /48 networks that we where now covering where considered to be on net and therefore effectively unreachable. This was noticed fairly quickly, corrected and we assumed that would be the end of it. We where wrong! The change seemed to apply correctly at the linux layer but for some reason quagga needed an extra kick.

Our anycast network is fairly large consisting of ~200 nodes announcing to ~90 upstream networks, most of who transit our network. It just so happened that one of the upstreams had an assignment that was in one of the 63999 adjacent networks. Murphy’s Law being what it is, this upstream was also our largest host providing ~50 locations and all bgp connections addressed with this space.

Ok so before progressing below is an example of the broken config we where working with. In the following Router1 is our hardware and Router2 is the upstream network.

Router1 – Interface configuration

auto eth0
iface eth0 inet static
  address 192.0.2.200/24
  gateway  192.0.2.1
 
iface eth0 inet6 static
  address 2001:DB8:1::64/64

auto dummy0 
iface lo inet6 loopback
iface dummy0 inet6 static
   address 2001:DB8::42/32

Router 1 – Quagga Config

router bgp 64496
!
 bgp router-id 192.0.2.200
!
 neighbor 2001:DB8:1::1 remote-as 64497
 no neighbor 2001:DB8:1::1 activate
 neighbor 2001:DB8:1::1 description upstream
!
 address-family ipv6
  neighbor 2001:DB8:1::1 activate
  neighbor 2001:DB8:1::1 soft-reconfiguration inbound
  neighbor 2001:DB8:1::1 prefix-list prefix-v6 out
  network 2001:DB8::/48
 exit-address-family
!
ipv6 prefix-list prefix-v6 seq 2 permit 2001:DB8::/48
!
line vty
!

Router2 – Interface config

# The primary network interface
auto eth0
iface eth0 inet static
  address 192.0.2.201/24
  gateway  192.0.2.1
 
iface eth0 inet6 static
  address 2001:DB8:1::1/64

Router2 – quagga config

router bgp 64497
!
 bgp router-id 192.0.2.201
!
 neighbor 2001:DB8:1::64 remote-as 64496
 no neighbor 2001:DB8:1::64 activate
!
 address-family ipv6
  neighbor 2001:DB8:1::64 activate
  neighbor 2001:DB8:1::64 default-originate
  neighbor 2001:DB8:1::64 soft-reconfiguration inbound
 exit-address-family
!
line vty
!

So as you can see the problem with this config is that the network that dummy0 is on the now encompasses the network of eth0. At this point i would like to say that in my opinion everything should still work. Router1 should announce 2001:DB8::/48 to Router2 with a next hop of 2001:DB8::64. As router1 and router2 are on the same network the overlapping prefixes should not cause an issue. The announcement will get propagated and as long as you are not one of the other 63998 networks there should be no problem and this is what seemed to happen as shown below.

Router2 – show ipv6 bgp

router2# show ipv6 bgp
BGP table version is 0, local router ID is 192.0.2.201
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
 
   Network          Next Hop            Metric LocPrf Weight Path
*> 2001:db8::/48    2001:db8:1::64           0             0 64496 i
 
Total number of prefixes 1

The problem was that we where seeing issues with this host. Using the RIPE Atlas Network we could see that probes hitting this host where receiving an ICMPv6 Type 1, Code 3 or in human terms, Host Unreachable. Looking at quagga we could see that the interface change earlier had taken place but there where some strange routes which seemed to be hanging about. quagga was restarted and things started to work again. The rest of this article was done in a lab environment however it is reliably repeatable in the current version of quagga (0.99.23 at the time of writing).

So with the lab in place it was time to do a bit more research. We have the show ip bgp above, lets take a look at a show ipv6 route.

Router2 – show ipv6 route

router2# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3,
       I - ISIS, B - BGP, * - FIB route.
 
C>* ::1/128 is directly connected, lo
B>* 2001:db8::/48 [20/0] via fe80::7c52:97ff:fe39:f4b7, eth0, 00:02:59
C>* 2001:db8:1::/64 is directly connected, eth0
C>* fe80::/64 is directly connected, eth0

A link-local address has been installed into the routing table but there is nothing wrong with that. rfc2545 allows a router to send a link-local address in addition to its global address as the next hope when sending an announcement. However on closer inspection we can see that this is the link-local address of the dummy interface and not the link local address of the ethernet interface with the global address sent as the next hop

Router1 – show interface

 
router1# show interface
Interface dummy0 is up, line protocol detection is disabled
  index 3 metric 1 mtu 1500
  flags: <UP,BROADCAST,RUNNING,NOARP>
  HWaddr: 7e:52:97:39:f4:b7
  inet6 2001:db8::42/48
  inet6 fe80::7c52:97ff:fe39:f4b7/64
Interface eth0 is up, line protocol detection is disabled
  index 2 metric 1 mtu 1500
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  HWaddr: 08:00:27:15:bd:18
  inet 192.0.2.200/24 broadcast 192.0.2.255
  inet6 2001:db8:1::64/64
  inet6 fe80::a00:27ff:fe15:bd18/64

The dummy interface does not preform any neighbour advertisements or if it does they do not go onto the physical network that eth0 is connected to. As such router2 is unable to reach the next hop present in its routing table, which explains the Host unreachable message we where seeing.

Router1’s routing table also looked strange. We receive a default gateway from Router2 so lets see how that looks

Router1 – show ipv6 route

ubuntu# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3,
       I - ISIS, B - BGP, * - FIB route.
 
B>* ::/0 [20/0] via fe80::a00:27ff:fe02:e9f2, dummy0, 00:01:25
C>* ::1/128 is directly connected, lo
C>* 2001:db8::/47 is directly connected, dummy0
C>* 2001:db8:1::/64 is directly connected, eth0
C * fe80::/64 is directly connected, dummy0
C>* fe80::/64 is directly connected, eth0

As you can see we receive the default route, again with a link-local address, however in this case it is the link local address of router2’s eth0, so all good there. however it has been installed against dummy0. As we have a physical connection to this link-local address we can route to it. the problem is any outgoing traffic is going to be sourced from the address of the dummy interface, as this is an anycast address that is, well less then ideal.

So for some reason quagga is picking the link-local address of the dummy interface to include in its announcements and it is installing the routes it receives to the dummy interface. So we know it is not basing this decision on the interface with the most specific network. It is also not basing it on the interface that received or sent the the announcements, finally it is not announcing the link-local address associated with the global address its sending. So what is it basing this decision on?

A bit of head scratching and it occurred to me that dummy0 is would come before eth0 in a lexicographical search. I didn’t think quagga would be making this decision based on such an arbitrary criteria, especially when there are better criteria. However it was worth a shot. I renamed eth0 to ath0 and took a look.

Router1 – show ipv6 route

router1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3,
       I - ISIS, B - BGP, * - FIB route.
 
B>* ::/0 [20/0] via fe80::a00:27ff:fe02:e9f2, ath0, 00:01:30
C>* ::1/128 is directly connected, lo
C>* 2001:db8::/47 is directly connected, dummy0
C>* 2001:db8:1::/64 is directly connected, ath0
C * fe80::/64 is directly connected, dummy0
C>* fe80::/64 is directly connected, ath0

Router2 – show ipv6 route

router2# show ipv6 bgp
BGP table version is 0, local router ID is 192.168.1.201
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, R Removed
Origin codes: i - IGP, e - EGP, ? - incomplete
 
   Network          Next Hop            Metric LocPrf Weight Path
*> 2001:db8::/48    2001:db8:1::64           0             0 64496 i
 
Total number of prefixes 1
ubuntu# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3,
       I - ISIS, B - BGP, * - FIB route.
 
C>* ::1/128 is directly connected, lo
B>* 2001:db8::/48 [20/0] via fe80::a00:27ff:fe15:bd18, eth0, 00:03:02
C>* 2001:db8:1::/64 is directly connected, eth0
C>* fe80::/64 is directly connected, eth0

So the change worked, router1 now has its default route installed to the ethernet interface, ath0. Furthermore router2 has installed the link-local address of ath0. I did a few more tests changing the dummy interface to a tap interface and an ethernet interface, just to ensure the ethernet type was having no affect. I also did a lot of renaming to confirm these results. The conclusion is that quagga is making these decisions based on which ever interface comes first in a lexicographic search. This must be a bug so i sent a mail to the quagga-dev list and will await an answer.

Now at this point we have a work around we can either rename our dummy interface or we can rename the ethernet interfaces to ensure quagga makes the correct decision. However i did not really like this option it seemed a bit fragile and i dont really like the idea of messing around with udev or modprobe rules to fix an issue like this. So of to the quagga documentation.

A quick search and i came across the ‘neighbor peer interface ifname‘ command. The documentation mentions this command is deprecated however it seemed like the exact command i needed, at the very least it should fix the announcements. I gave this a shot but unfortunately it seemed to have no effect. you can see output for these tests in the quagga thread however take note that there are a few small difference in the config used in that post.

One of my colleagues suggested setting the peering to a multi hop peering, with the idea that a multi hop peering would not use the link-local address. Seemed like a fair assumption unfortunately this too had no affect

During the initial debugging of this issue our host had mentioned a previous bug they had come across where no next hop was been set. They had to create a route map and use ‘set ipv6 next-hop global’ to force the global next hop. As we where not having a problem with the global next hop I wasn’t confident this was going to work, that said no harm in trying, i gave it a shot but unfortunately no luck. However all was not lost this suggestion led us to the ‘set ipv6 next-hop local’ setting. This looked promising.

Router 1 – quagga config

router bgp 64496
!
 bgp router-id 192.0.2.200
!
 neighbor 2001:DB8:1::1 remote-as 64497
 no neighbor 2001:DB8:1::1 activate
 neighbor 2001:DB8:1::1 description upstream
!
 address-family ipv6
  neighbor 2001:DB8:1::1 activate
  neighbor 2001:DB8:1::1 soft-reconfiguration inbound
  neighbor 2001:DB8:1::1 prefix-list prefix-v6 out
  neighbor 2001:DB8:1::1 route-map FIX-v6-NEXTHOP out
  network 2001:DB8::/48
 exit-address-family
!
ipv6 prefix-list prefix-v6 seq 2 permit 2001:DB8::/48
route-map FIX-v6-NEXTHOP permit 10
 set ipv6 next-hop local fe80::a00:27ff:fe15:bd18

In the above the link-local address used is the one for our eth0 interface and it seems we have some success. this change fixed the announcements. Regardless of the interface name the correct link-local address would be sent. However we still had the problem that any routes received where being installed to the dummy interface.

My first thought was to try and disable link-local addresses on the dummy interface. I checked the documentation on ipv6 kernel parameters alas the magic switch i was looking for did not exist. Further reading of the ipv6 RFC’s led me to the following statment. “All interfaces are required to have at least one Link-Local unicast address”. That didn’t sound promising, however the loopback interface does not have a link-local so perhaps all is not lost. So i tried the following

iface lo inet6 loopback
   address 2001:DB8::42/47

Unfortunately this did not work as anticipated, the loopback interface came back with no ipv6 address. I was starting to run out of ideas, well sane ones at least, so i thought i would try and ask for some input from ##network on freenode. Now i can imagine some of you are probably thinking “i thought you where seeking sane ideas”, well in my defence it was getting late and i had had a few whiskeys so i was prepared for irc.

So, irc, straight away i got the response that the config is broken, fix it, problem solved. I tried to explain that i knew this but i still thought quagga was wrong and wanted to get a work around and understand if this was a sane decision for quagga to make. However this is IRC so it obviously didn’t go that way. Anyway after about 30 minutes someone suggested using lo (although it was only because lo would be after eth in the search). I mentioned that i had tried this with little success. This is when they pointed out that i had only tried the interface script and hadn’t tried setting it with the ip command. you can see the relevant parts of the irc chat here if your board.

Anyway using ‘ip addr add 2001:db8::42/47 dev lo’ seemed to work. so we now have an interface with a global ipv6 address and no link local address, looks like we are onto a winner. lets see what it looks like, notice below that router1 is now using zzz0 to insure it is last in a lexicographic search

Router1

router1# show interface
Interface lo is up, line protocol detection is disabled
  index 1 metric 0 mtu 16436
  flags: <UP,LOOPBACK,RUNNING>
  inet 127.0.0.1/8
  inet6 ::1/128
  inet6 2001:db8::42/47
Interface zzz0 is up, line protocol detection is disabled
  index 2 metric 0 mtu 1500
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  HWaddr: 08:00:27:15:bd:18
  inet 192.168.1.200/24 broadcast 192.168.1.255
  inet6 2001:db8:1::64/64
  inet6 fe80::a00:27ff:fe15:bd18/64
router1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv6, I - IS-IS, B - BGP, A - Babel,
       > - selected route, * - FIB route

B>* ::/0 [20/0] via fe80::a00:27ff:fe02:e9f2, zzz0, 00:00:33
C>* ::1/128 is directly connected, lo
C>* 2001:db8::/47 is directly connected, lo
C>* 2001:db8:1::/64 is directly connected, zzz0
C>* fe80::/64 is directly connected, zzz0

router2 remains the same after the route-map fix. Finally all looks good, router2 has the correct route to us and routes we receive have been installed against the correct interface on router1. Win, just need to get this working with the network config scripts. i retried the lo config above to make sure i hadn’t missed anything, i also tried adding the following

iface lo inet6 loopback
  post-up ip -f inet6 addr add 2001:db8::42/47 dev lo

unfortunately neither of these would apply a v6 address to the loopback after a reboot. so instead i added the ip command to /etc/rc.local, this worked the system rebooted and had the correct address, just time to check the routing table again.

router1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv6, I - IS-IS, B - BGP, A - Babel,
       > - selected route, * - FIB route

B>* ::/0 [20/0] via fe80::a00:27ff:fe02:e9f2, lo, 00:00:50
C>* ::1/128 is directly connected, lo
C>* 2001:db8::/47 is directly connected, lo
C>* 2001:db8:1::/64 is directly connected, zzz0
C>* fe80::/64 is directly connected, zzz0

Nooooooo, the route is on lo again :(. it turns out the loopback fix only worked because i added the v6 address after quagga had already inserted the route into its routing table. Restarting quagga (on either side) or adding the address before quagga had inserted its route both results in the same behaviour as the dummy interfaces, with one exception. As there is no link-local address on lo, router2 inserts the global address into its routing table so that side of things would still work without the route-map.

router2# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv6, I - IS-IS, B - BGP, A - Babel,
       > - selected route, * - FIB route

C>* ::1/128 is directly connected, lo
B>* 2001:db8::/48 [20/0] via 2001:db8:1::64, eth1, 00:08:56
C>* 2001:db8:1::/64 is directly connected, eth1
C>* fe80::/64 is directly connected, eth1

Conclusion
Well ill be honest when i started writing this article i thought the loopback interface fix had solved things. yes the bit where i say “just time to check the routing table again.” after getting the loopback interface to work, after that article was “live”. I released that i had come across the loopback conclusion at 2am so i should probably check it. As i tested i found it hard to reproduce the supposed fix i had seen the previous night.

Anyway I guess the conclusion is don’t make the mistake in the first place, in this example dummy0 should be addressed as a /128 and then there is no problem. However we all know mistakes happen and as engineers we should try to reduce the impact of the same mistake happening again. Without a better solution it seems i have looped back to my original solution and will be recommending that we rename the dummy interface to something that looses the lexicographical search race. I will also work to add the route-map that forces the ipv6 next hop local address.

Using the loopback interface caused other issues due to the fact that there is no link-local address. with the loopback interface the next hop for the default route becomes unreachable due to the lack of a link-local address so it is probably best to stick with the dummy interface

root@router1:~# ip -6 route
2001:db8:1::/64 dev zzz0  proto kernel  metric 256
unreachable 2001:db8::/47 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev zzz0  proto kernel  metric 256
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
unreachable default dev lo  proto zebra  metric 1024  error -101
Advertisements



Advertisements