Why Is Multi-Cloud Networking So Complicated?

One of the challenges that many SaaS products face as they mature and move into enterprise markets is a growing demand for private networking. This is because companies may be wary of allowing access to potentially sensitive internal services over the internet – and in our experience with Immuta as a Data Security Platform, this is especially important to many of our customers.

Immuta SaaS is housed in the Amazon Web Services cloud, and so it has long supported private connectivity to customer data sources using AWS PrivateLink. However, implementing a similar solution for other cloud providers, such as Microsoft Azure and Google Cloud Platform (GCP), is much more complicated.

The question is: why?

First, we need to understand a bit about how the internet works under the hood. Buckle up!

The internet is not magic

When you visit a website in your browser, it likely takes around 100ms to receive an initial response from the server that hosts the website, and another few seconds to finish rendering all the content on the page. During those initial milliseconds, your browser looked up the website’s public IP address using DNS, the internet’s address book, before sending a request to it. The server responds with an empty content frame and a set of scripts that make additional calls to retrieve the static and dynamic content to set into it.

This explanation contains an enormous amount of hand-waving, particularly in the phrases “sending a request” and the word “responds.” Let’s dig a bit deeper into what’s happening.

If you trace the route that your packets are taking on their journey, you might be surprised. For example, here’s me hitting a website:

https://www.immuta.com/wp-content/uploads/2024/07/Multi-Cloud-Map.png

After several hops from my local machine to the edge of our VPN, my packets are routed to Kansas, by way of San Francisco, Washington DC, and Phoenix. On paper, this seems massively inefficient. If you took a series of flights like this, the journey would take you 19+ hours.

The results speak for themselves, however. I was able to get a response from that Kansan server in ~100 ms. How is this possible?

Routing requests via the Border Gateway Protocol

Let’s go back to the beginning of our request. The DNS server that is an authority on the domain I’m trying to reach tells my browser that the IP address of the server I want is 141.193.213.21.

Ok…Where on earth is that, and how do we get there?

Unfortunately, we can’t simply look it up. There is no Google Maps equivalent for the internet. Unlike road-based maps in real life, the internet cannot rely on Static Routing. There is a limited pool of public IPv4 addresses that can be used, and unlike real addresses, they can be reused for different things. An IP address could be pointed to a different location on a whim. Google Maps may be sophisticated enough to understand traffic patterns and parse many static routes to find the most efficient path to a destination, but it can’t handle the address of that destination being moved arbitrarily, and in the middle of a journey.

Instead of static routing, the internet uses Dynamic Routing, in which a series of physical computers broadcast, or Advertise, the available routes they can send traffic to using the Border Gateway Protocol (BGP). Your computer makes a series of connections to these routers, “hopping” from one to another as they send you to the most efficient route they can find to your destination. These physical computers are called Routers, and are owned by different Network Operators, including Internet Service Providers (yeah, those jerks), tech companies, universities, banks, government institutions, etc.

Network operators each control one or more Autonomous Systems (AS), which are a collection of the IP addresses they are assigned by the Internet Assigned Numbers Authority, and are individually identified by a number called an ASN. Each Autonomous System has its own routing policies based on business agreements with other operators and technical considerations. These policies may include details like preferential routing, traffic shaping, and load balancing.

The broadcasted routes and routing policies are taken into account when performing the complex BGP calculations needed to determine the ideal route to a destination. The miracle of the protocol is how fast it’s able to make these calculations, despite the internet’s massive and exponentially increasing complexity.

What does this have to do with multi-cloud?

BGP is not just how public Autonomous Systems talk to each other, but how private ones do as well. When we rebuilt Immuta’s SaaS network in the second half of 2022, we migrated from a static-route-driven network to one that used BGP-driven dynamic routing to create a global network mesh. Similar to the way I described the internet above, our internal Wide Area Network (WAN) uses BGP to propagate routing information across software-defined routers in our AWS private network.

It’s not just the routers that are software-defined – the entirety of our network, from EC2 interface to global-segment-defining network policy, is virtual. It has abstracted away the minutiae of networking equipment configuration, cabling, port security, redundancy, etc., leaving us with a clean API that allows us to control only the things we care about.

We’re paying AWS for that privilege, of course. But it’s pretty easy to argue that the ability to build a stable, secure, and extensible global networking and compute layer in six months is a significant capability advantage we got from not needing to deal with hardware. This bubble of luxury has a border, however – and to scale our network across cloud providers, we’re going to have to pop it.

Leaving home

Previously, if we wanted to send traffic to a location outside the AWS network, we had to build the bridges between the Autonomous Systems that span them ourselves. This involved one of two options:

  1. Find a co-location site with Cloud Provider networking gear we could hook into, in every region that we need to operate. Then procure and set up the necessary equipment and cabling to route from our network in AWS to our target.
  2. Buy a multi-cloud router solution from a Cloud Exchange Provider (CEP).

For many companies, the first option is prohibitively expensive and operationally infeasible, so they usually explore the second by evaluating different CEPs. There are nuanced differences between the various multi-cloud router products on the market, but I want to focus on why their solutions are so fundamentally different from a traditional hyperscaler’s network offerings.

Metal and glass have no API

When you get down to brass tacks, Cloud Exchange Providers rent out their networking hardware. To connect our network to Azure, for example, we procure bandwidth on specific routers in specific data centers and set up DirectConnect (AWS) and ExpressRoute (Azure) connections to those routers.

Dropping your virtual networking abstraction means that you need to account for hardware maintenance, route redundancy, and a whole host of other issues. You are also now responsible for your own network operations management – there’s no cloud provider NOC that is watching for issues with TCP retransmits, jitter, latency, or outages across your multi-cloud network. That is now your responsibility.

There’s also not a singular API that you can use to configure this infrastructure. Most CEPs offer ways to make it easier, but because of how different each cloud provider is and how varied their customers’ requirements are, they can’t just build a turnkey solution. All they can do is provide an interface for you to build the cloud interconnects yourself.

In order to cross clouds, you have to deal with the reality of the internet: metal, glass, and silicon.

A note on IPSec

You may have noted the conspicuous absence of the most common bridge networking solution: IPSec tunnels. All major cloud providers offer solutions for creating encrypted tunnels between networks using the IPSec protocol, and it’s probably the quickest way to set up cross-cloud connectivity.

IPSec – short for Internet Protocol Security – is a set of rules that extend the Internet Protocol to add support for encryption and authentication. HTTPS also does this, but only for HTTP traffic – meaning it excludes other types of transmission control protocol (TCP) and user datagram protocol (UDP) traffic, which also need to be encrypted. IPSec, on the other hand, functions at Layer 3 of the OSI Model, so it is able to encrypt all types of traffic.

IPSec is most commonly used to create encrypted tunnels between networks across commodity internet, as it tends to be much less expensive and requires far less engineering effort to operationalize. From a reliability perspective, however, it leaves a lot to be desired.

When IPSec tunnels fail – which they do regularly – they are usually configured with a second, failover tunnel that starts accepting traffic to the same destination. The failover process can be pretty slow, sometimes on the order of minutes, and during that time traffic is getting dropped. IPSec also introduces latency which, depending on the workload, can be unacceptable.

It’s critical for us to ensure that the networking under our control has the highest availability and fault tolerance possible. We have invested no small amount of engineering effort to ensure that network partition events on our infrastructure do not interrupt our services.

Feature spotlight: Private connectivity over Azure PrivateLink

We’re happy to announce that private connectivity to Databricks, Snowflake, and Starburst over Azure PrivateLink is now available in Preview for Immuta SaaS customers. We have built a reliable, highly available bridge network between AWS and Azure on top of our cloud exchange. Customers are able to contact Immuta Support to request that private networking be configured for their environment.

Additionally, our AWS PrivateLink offerings for Snowflake and Databricks are now Generally Available.

This adds to the growing list of Cloud Providers and Data Platforms that we support private connectivity to:
AWS PrivateLink

  • Amazon Redshift (Preview)
  • Databricks (GA)
  • Snowflake (GA)
  • Starburst (Preview)

Azure PrivateLink

  • Databricks (Preview)
  • Snowflake (Preview)
  • Starburst (Preview)

We’re continuously working to meet the high standards for security and compliance our customers expect, and to provide them with robust, reliable networking solutions. Immuta SaaS will continue to expand its private networking offerings for additional Data Platforms and Cloud Providers. Stay tuned for more details!

Hit the ground running.

Get in touch with our team to learn more.

Request a demo
Blog

Related stories