A friend recently asked me:
I would like to know how to build a global CDN network. I’m currently exploring all the CDN options on the market, but would like to know how to build one since it looks like an interesting problem to solve.
A Content Delivery Network(CDN) is nothing but a service which allows you to host content(mostly static content or video streaming), is blazing fast by leveraging nodes around the world and also reduces the load on your website or application. In this post I would like to talk about strategies involved in building a CDN.
Typically if you are designing a CDN, your primary goal should be getting the content closest to the user, both on the network front as well as caching/storing the content physically closest to the user.
Caching or storing content at edge
You can either host the content at the edge or cache it and proxy the connection to the origin, where you store it. The latter does cut down on the TLS connections needed to create a session to transfer content by a magnitude of 4xRTT(due to the way TLS works) and caching could further improve performance. This is a good enough trade off for most CDN networks.
Storing the data closest to the user would obviously be better. A good idea can be to write software to push the most retrieved data to the closest node. You would also need to interface this with DNS to direct users accordingly.
Using GeoDNS to target locations
With GeoDNS, you can return different DNS answers depending on the IP address of the user. The authoritative DNS server can be configured to return a different IP depending on the source IP of the resolver making the query. Normally a GeoIP database is used to determine the region the query is being made from.
The caveat to this is that many people may use alternate DNS servers like Google DNS. That would give you the IP of the recursive DNS server making the queries and not the source IP. For eg. Google DNS makes queries from Singapore for India region. EDNS client subnet is the answer to that, which provides the address of the source IP accurate to a /24 for IPv4 and /56 for IPv6. Many open source as well as hosted authoritative DNS servers have support for it. AWS Route 53, DNSMadeEasy and self-hosted options like PowerDNS support both EDNS client subnet and GeoDNS.
The problem with this approach is that it is very effective to get the approximate location, but not send the traffic to the exact node if you have multiple nodes in the region. The next solution will talk more about that.
BGP and IP anycast
With IP anycast, you can announce the same IP range in different locations and BGP will take care to route the user to the nearest network node. The benefits to this approach are enormous. You no longer have to rely on the accuracy of accurate geolocation data, especially when you keep adding nodes. You can also directly peer or host your content within a customer orientated ISP network. However you have to be careful. There can be cases like this where it will not work well.
Node 1(uplink: AS X, location: Mumbai)
Node 2(uplink is AS Y, location: Delhi)
Customer(uplink is AS Y, location: Mumbai)
The traffic for the customer would hit the Delhi node as the path through AS Y is the closest in the eyes of BGP. Ideally you have to have the same uplink at all regions and peer selectively or use BGP communities extensively.
Another strategy that networks like Akamai and Netflix employ is that they would have their servers within an ISP network, peer with the ISP and announce their routes/receive routes from the ISP. Then can then direct traffic to the node with DNS. This would also further improve performance for CG-NATed networks, apart from the advantages of having content very close to the user. There’s a fascinating video about how Facebook’s network was scaled up.
The main take away from the video is that no solution is perfect. An iterative approach to scaling up the network is a good idea.
Not build one at all!
Some of the larger CDNs like Akamai, Cloudflare, Stackpath, AWS Cloudfront have a near monopoly with the proximity to the user, feature set they have and trying to build one from scratch could be more troublesome than just using their services. Many of them are open to bulk deals if you have enough volume from my understanding. There’s also BunnyCDN and KeyCDN if you would like something more affordable.
Hope you found this post useful. I’m planning to build something which leverages these ideas. Stay tuned.. π