In Part 1 of this blog series we discussed some common secure architecture patterns for connecting to your data security using Azure Data Factory and Synapse Pipeline Integration Runtimes. We covered the Azure IR with Managed VNet, Self-hosted IRs (SHIR), and using a combination of the two.
In Part 2 we’re going to discuss a newer, much less common architecture pattern. As mentioned in Part 1, I have only seen this approach being mentioned in a Microsoft tutorial which I believe surfaced some time in late 2021, and I only know of one client that has attempted to use it, so I spent some time figuring out exactly how it works and I thought I would offer up my own explanation. We’ll also discuss the pros and cons of the approach, and at the end we’ll update our flow diagram of how to choose the right runtime for you scenario.
The architecture we’re going to discuss offers a way of connecting to on-prem data sources using the Azure IR with Managed VNet, instead of having to use an SHIR. If you need a reminder on these, head back to Part 1. The way this is achieved is by using Azure Private Link Service with IP Forwarding, which I will explain more about later.
The architecture diagram is as follows:
Now, I know this diagram isn’t very pretty to look at. It’s definitely quite complex and there’s lots going on, so let’s break it down. The network route I’m going to explain is going from the Azure IR all the way to this example on-prem SQL Server. This route is highlighted below.
Azure IR with Managed VNet
We should already be familiar with the left hand side of this diagram from Part 1; this is our standard Azure IR with Managed VNet set up. Ignoring the middle Managed Private Endpoint for now, we have an Azure IR inside a Managed VNet with two Managed Private Endpoints for talking to our Azure resources (in this case an Azure SQL Database and an Azure Blob Storage Account).
The right hand side should also not look too unfamiliar. This is our on-prem data source sitting inside our Corporate network, in this case it’s a SQL Server, and we’re using an ExpressRoute to connect our on-prem corporate network to the cloud. This is the same approach we had in our SHIR scenario from Part 1 (however we don’t have any SHIRs here).
Azure Private Link Service - the stuff in the middle!
So what’s all the stuff in he middle? Specifically, this stuff:
This is a standard architecture pattern for using Azure Private Link Service (PLS), which in this case is being used to facilitate secure network connectivity between the Azure IR and the on-prem SQL Server. Let’s dig into this architecture a bit more.
Azure Private Link Service in detail
This diagram is from the Microsoft documentation on Azure PLS.
Azure PLS is a way that Microsoft recommend for allowing private networks to securely access a service running behind an Azure Load Balancer. In the above diagram, we have some consumers on the left-hand side which are inside a private network; let’s call it the consumer network. There are two examples of consumers here: one accessing the service from on-prem and one from a VM. The service they’re trying to consume is being hosted on the VMs on the right-hand side, which is also inside a private network; let’s call it the provider network. I’ve highlighted the consumers and providers below.
To facilitate secure connectivity between the consumer network and the provider network we first ensure that the VMs providing the service are sat behind a Load Balancer. We then deploy an Azure PLS inside the provider network connected to the Load Balancer. For the consumers to be able to consume the service from provider, their traffic now needs to be directed to the PLS. To achieve that, we deploy a Private Endpoint for the PLS inside the consumer network (resource type: Microsoft.Network/privateLinkServices).
The consumers can new securely consume the service via the Private Endpoint.
Applying the PLS use case to our IR scenario
So let’s switch back to our IR architecture diagram, and explore how the PLS use case we just discussed applies to it.
In our example, the consumer is the Azure IR, and the consumer network is the Managed VNet that the IR sits in. The service that it’s trying to consume is our on-prem database. For now, we are going to say that the VMs highlighted in the diagram above are the ones providing the service of the database. Once we are comfortable with the route between the Azure IR and the VMs, we will then cover the final part of the route between the VMs and on the on-prem SQL Server (hint: this is where the IP Forwarding comes in).
Hopefully you can see how the middle of our architecture diagram is relatively similar to the PLS architecture we just looked at.
We can see that we have:
- Some VMs providing a service sat inside a private network;
- The VMs are sat behind a Load Balancer;
- The Load Balancer is connected to a an Azure PLS, which is also deployed inside the provider network;
- There is a Private Endpoint deployed in the consumer network which is connected to the PLS. In our case this is a Managed Private Endpoint inside the Azure IR Managed VNet.
Therefore with these components in place, the Azure IR is able to securely connect to the VMs. We’ll now cover the last portion of the route which is between the VMs and the on-prem SQL Server
When the network traffic reaches either of the VMs, the VM looks at it’s firewall and asks “what should I do with this traffic?” It could deny the traffic access, for example. In our scenario, we are going utitlise IP Forwarding - which is essentially just a rule on the VM’s firewall that says “pass this traffic onto a specific IP address”.
So, for our use case, we add the rule to pass any traffic hitting the VM onto the IP address of our SQL Server. The VM simply acts as a forwarder for the network traffic.
There is one additional piece of configuration required inside ADF/Synapse to facilitate this connectivity. When creating the Managed Private Endpoint for the PLS, you will need to add the FQDN of the on-prem SQL Server as an additional parameter on there. This serves as a mapping between the domain name of your server and the private IP address of the Managed Private Endpoint. This ensures that when your Linked Service for your on-prem server initiates a request to the domain name, the Azure IR knows where to send the traffic - which is to the Managed Private Endpoint.
You will be pleased to know that this completes our network route!
Since we are using the Azure IR with Managed VNet in this scenario, we do inherit some pros of using this approach. For example you aren’t responsible for managing either the compute infrastructure or the network security of your IR.
However, I’m not sure you’re really getting any benefit here, because, yes, you don’t have to manage the compute infrastructure of your IR, but you are gaining two more VMs which you are responsible for deploying and maintaining. Similarly, yes, you don’t have to manage the network security of your IR, but you are gaining a whole other VNet that you are responsible for deploying and maintaining (labelled “Forwarding Virtual Network” in the diagram).
The most obvious con here is that it’s quite a complex architecture, and there’s a lot of networking components. You need to be confident that you have the knowledge within your team to deploy, maintain, and potentially debug such an architecture before choosing this approach.
In summary, I would say that the trade off of not having to manage the IR, but gaining a lot more networking infrastructure, is not worth it. In my opinion, the SHIR approach is a much simpler architecture to deploy and maintain.
The only scenario I can think of where this approach would be useful, is if you are already using an Azure PLS to facilitate secure connectivity to other services. In which case, you would already have the majority of the infrastructure set up, and you would just need to deploy an additional (Managed) Private Endpoint for your Azure IR. I would not recommend setting up Azure PLS simply for on-prem connectivity for ADF/Synapse.
Conclusion: How to choose the best architecture for your scenario
In Part 1 we saw a flow diagram helping you to choose the best architecture for your scenario. Here is the updated flow diagram to include the Azure IR with Managed VNet and IP Forwarding option.
Please do let me know if you’re using this approach or have seen it being used. I’d be really interested to hear your thoughts!