Multi-WAN failover without session termination

By | 5th April 2018

One of the reasons why people run BGP is when reliability is of utmost importance and sessions terminating abruptly can simply be not afforded, especially in cases like VOIP. However this is an expensive proposition both cost and skill wise. There may be a middle ground between this and simple failover with multiple uplinks. That is what I will be exploring here.

What you will need

  1. Multiple ISP uplinks with public IPs, commonly called a leased line.
  2. VM on the cloud with 2 public IPs(doesn’t matter if on the same subnet). Many providers give added IPs for a nominal cost.
  3. Basic knowledge about GRE, OSPF, BGP.

In this example I am running everything on GNS3 and on Mikrotik CHR routers. Most router firmwares will be able to do what I am doing here. For the cloud VM side, you can run Quagga on Linux for the dynamic protocols. This will give a basic gist on how to proceed with things.

To make things clear, here’s how I have setup my GNS3 homelab to simulate this.

Testing setup with GNS3

Create GRE tunnels

First I created GRE tunnels. Endpoints in my case were 192.168.10.201/24 and 192.168.10.202/24 for tunnel1 and 192.168.2.2/24 and 192.168.3.1/24 for tunnel2. I then added IPs on either end of the newly created tunnels. 192.168.111.201/24, 192.168.111.202/24 and 192.168.112.201/24, 192.168.112.202/24. I used /24s, /30 would be preferred.

There may be ways to run 2 GRE tunnels on the same IP or make it more secure. I haven’t explored that.

Run OSPF on top

I then ran OSPF. Added the newly created GRE networks 192.168.111.0/24 and 192.168.112/0/24 on either end. Then I originated 192.168.113.1/32 and 192.168.113.2/32 sitting on loopback interfaces at either end.

You will have to play around to get the OSPF convergence times down. I used the following settings:

  • Retransmit interval: 3s
  • Transmit delay: 1s
  • Hello interval: 5s
  • Router dead interval: 10s

Play around and see how much time it takes to converge after a link failure. Also make sure you set a higher cost for the link that you want to be secondary. I used OSPF because of fast convergence times.

Use iBGP to propagate routes

You can originate 0.0.0.0/0 from Cloud VM and the internal network you are planning to use(192.168.20.0/24 in my case) from the home router. Run BGP on top of the /32s we originated/propogated with OSPF. You can use a private ASN for this purpose.

NAT everything together

Before this you need to make sure you have the default route pointed correctly on the Cloud VM.

You’ll need to use source NAT. Point out-interface and out-address to the interface/IP from which you want traffic to exit from on the cloud VM.

Testing everything

Easiest way is to start a SSH session from the end host and remove uplinks to test failure.

Everything working as it should without multiple attempts felt great. If you need any help, you can always reach out to me. Anyways time to goto bed 😀

One thought on “Multi-WAN failover without session termination

Leave a Reply