- Ben Watt
Securing Azure Storage for a Power BI Implementation
At a customer site, I developed some Power BI dataflows and datasets that connect to Azure Storage. Recently, we decided to secure the storage accounts using the built-in IP Address and Virtual Network options. It was a little challenging to get the Power BI Service to connect to the storage, so this blog is going to run you through the options & hopefully save you time, and money.
Up front, there are a lot of things I won't be covering as we'd be here all day. So Private Endpoints, ExpressRoute, Resource Instances, and more are off the table for this blog. This is also focused on a Power BI environment, so workstations and the Power BI Service are the focal points.
The Configuration, the Sources & the Metrics
Firstly, select the configuration for Selected Networks and we’re ready to start adding rules to grant access.
When adding rules to secure the storage account, you need to consider two types of sources to grant access. Workstations, so you can develop your Power BI solution in Desktop & refresh from the source and the Power BI Service for scheduled refreshes of your published datasets/dataflows.
When assessing the various options, I'm measuring three factors: Security, Performance and Cost. Let's break those down:
Simple enough, your storage account is either firewalled (secure) or not (not secure), but there is a little grey area in between, which I'll cover below.
Breaks out into two things:
Compute: From the Power BI point of view, you will be using Power Query to connect & transform the data. The mashup engine which does all the query work may shift to a compute resource you own and manage, as opposed to Power Query Online which is hosted in Azure.
Latency: is important when you add distance between your Power BI tenant and the storage account.
Also breaks out into two things:
Azure resources: simple cost to provision cloud stuff
Time/effort: someone spending time doing stuff.
IP Address Rules
First up, for my workstation at home or office, I can add the IP addresses to the rules and just like that, I can access the storage. I use www.whatsmyip.com to get the external IP.
Remember that your public IP, from both home and work, can potentially be shared with many other people. To reduce the surface area of access, in the case of very sensitive data, you might consider to using a VPN with a dedicated IP or set up your own Azure VM with a static IP (different rules apply to Azure VM, hold tight & we'll get to that).
That takes care of your workstation, but you also need to get access sorted for the Power BI Service to allow scheduled refreshes.
The IP ranges of Power BI and all other Azure Public Cloud services, are published in a file which is updated weekly, so you can automate the download & add them to your Firewall rules. However, don't get excited because there are deal breakers here.
If your Power BI tenant is in the same Azure region as your storage account, which in most scenarios it should be, then the IP Address rules don't apply. This makes sense as within the same data center you don't want your data traversing outside & back in via the public internet.
If your Power BI tenant is in a different Azure Region you can use the IP Address rules. The Power BI Service has well over 300 IP addresses in the public list, however there is a limit of 200 rules you can apply to a storage account.
To combat the overabundance of Power BI IP addresses against the 200 IP limit, I set up diagnostics on the Storage Account, limiting to StorageRead. This data can be picked up by Event Hub and automatically added to your rules, so you only add the actual IP addresses hitting your storage account on an as needed basis.
Set your diagnostics to StorageRead
As an aside, I also set the diagnostics to write to a different storage account and created some lovely Power BI monitoring.
IP Address Rule Summary
The conclusion for IP Address rules, it will only work for your home/office workstation and if your Power BI tenant is in a different Azure Region. Add to this, a server in your corporate network running the On-Premises Data Gateway which is covered below.
If a dedicated IP Address is used then security is high. If you use a shared IP (home/work) then it drops a little, but the overall surface area for intrusion is still immensely reduced.
No impact on network latency. The Power Query mashup engine is still hosted by Power BI so compute performance for your queries is also good.
Some time & effort is needed to monitor your home & office IP addresses as they can rotate, which you won't have control over.
This shows you a diagram of setting up your workstation and Power BI, in a different tenant, via the IP Address rules.
Virtual Network rules
The next option is to use the Virtual Network rules. This grants an Azure resource attached to an Azure VNet access to the storage account. This is the alternative option when the resource you are trying to connect to Azure Storage is within the same region.
If my workstation is an Azure VM, then I would simply grant access to that VNet. This works across tenants too, I have a dedicated Azure VM for my customer & they added my VNet to their Azure Storage rules. Cool!
What about the Power BI Service though? We can't use the IP Address rules and we don't have access to the Power BI Service VNet to add into our rules. Therefore, the IP and Virtual Network rules don't apply. What to do?
Here we see an Azure VM getting access as it's VNet has been granted access. The VNet option doesn't give Power BI a way in though.
For the Power BI Service, we need to add a middle-man to the equation, which creates a link between Power BI and the Azure Storage account.
On-Premises Data Gateway
The on-prem Data Gateway can be installed on a server, either an Azure VM or on-premises in your corporate network. This will be used by the Power BI Service for scheduled refreshes. The server running your gateway will need connectivity to the storage account via IP Address (if in your corporate network) or Virtual Network (if it's an Azure VM) rules.
With the addition of the Data Gateway, your Power Query workloads will move to the server, hence a reasonable level of compute will be required.
On-Premises Data Gateway Summary
Security is high, the only thing to consider is a dedicated external IP from your corporate network.
Adding an on-premises Data Gateway in your corporate network, to allow the Power BI Service to reach a Storage Account in the same Region is going to add latency. In fact, avoid this option for Azure Storage, or indeed any other data source already in Azure. It's not called "on-premises" for nothing.
You will need to monitor your VM performance as your Power Query engine will move to the VM hosting the gateway (for the queries that require the gateway).
You may get away with a 2-4 core VM if the workload is very light, but be prepared to scale up if you're churning significant data volumes. The actual recommended minimum is 8-cores.
The Gateway does not cost anything to install. If you use an Azure VM, you will likely need a 8-16 core VM so you're looking at over 600 USD/EUR per month for the 16-core option, using a DS16as VM instance as an example.
Add to that the time and effort to monitor & maintain the Gateway software, update it manually, and troubleshoot.
With the on-premises Data Gateway configured, we have now created these additional routes. One is an obvious performance killer, given the journey it adds!
Virtual Network Data Gateway
There is a new-ish service in public preview at the moment called the Virtual Network Data Gateway. This is a managed gateway meaning you don't need a VM & don't need to install/maintain the software.
There are limitations, so let's get these out there first.
It's in public preview, that means it doesn't have an SLA but support services still apply.
It's a Premium-only feature (Premium Capacity or Premium-per-user licence). Keep an eye on this, as it might change when it goes GA.
Only 11 sources are supported, at the time of writing.
The configuration and installation of the VNet Data Gateway is not overly cumbersome but if you aren't the Azure tenant admin and Power Platform admin then you're going to need to ask some people nicely to make some changes at Azure Subscription and Power Platform Admin level.
The three steps are clearly articulated at this Create virtual network data gateways Docs page. There are a few things to configure across your tenant, but the steps are clear and I found it easy to follow & apply.
With all the caveats of a public preview service aside, it's a fantastic way to allow the Power BI Service access to your Azure Storage account.
VNet Data Gateway Summary
Security is high. Not much else to say there.
Network latency is not affected, your traffic zooms through the data center pipes! Your Power Query mashup engine is fully hosted, which I believe gives you better performance than the VM option, depending on the size of your VM.
The VNet Data Gateway does cost money. It costs nothing to spin up but has data transfer costs. They vary per Zone but as an example scenario, let's say you transfer 100GB in & 100GB out over a month in Zone 1, you will get change out of 10 USD/EUR for the month. Bargain!
The Virtual Network Gateway adds a direct route from the Power BI Service to the Azure Storage account. Nice!
Want to see all of the options in one go? Yeah me too. By the way, blue is traffic within the data center (via Virtual Network rules) and green is traffic from outside (via IP Address rules)
Hopefully this has steered you in the right direction if you're in this situation.
On-premises Data Gateway
Power BI VNet Gateway
Azure Storage Firewall and Virtual Networks
Azure IP Ranges - Public Cloud