Azure Configuration
Complete guide for configuring Azure Blob Storage and Azure Data Lake Gen2 connectivity with FastTransfer.
Supported Azure Storage Services
| Service | Protocol | Best For |
|---|---|---|
| Azure Blob Storage | abs:// | General-purpose object storage |
| Azure Data Lake Gen2 | abfss:// | Big data analytics and hierarchical namespaces |
Authentication Methods
FastTransfer supports multiple Azure authentication methods:
- Azure CLI (recommended for local development)
- Connection String (simple but less secure)
- Storage Account Key (via environment variables)
- Managed Identity (recommended for Azure infrastructure)
1. Azure CLI
The simplest method for local development and testing.
Setup:
# Install Azure CLI
# Windows: Download from https://aka.ms/installazurecliwindows
# Linux: curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Login to Azure
az login
# List subscriptions
az account list --output table
# Set active subscription (if you have multiple)
az account set --subscription "My Subscription Name"
# or by ID
az account set --subscription "12345678-1234-1234-1234-123456789abc"
# Verify access to storage account
az storage account show --name mystorageaccount --resource-group myresourcegroup
FastTransfer will automatically use the Azure CLI's authentication credentials.
Using with FastTransfer:
# No additional configuration needed - just run FastTransfer
./FastTransfer \
...
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/exports" \
--fileoutput "data.parquet" \
...
2. Connection String
Useful for CI/CD pipelines and automated workflows.
Get Connection String:
- Azure Portal
- Azure CLI
- Navigate to your Storage Account
- Go to Security + networking → Access keys
- Copy the Connection string for key1 or key2
az storage account show-connection-string \
--name mystorageaccount \
--resource-group myresourcegroup \
--output tsv
Set Connection String:
- Windows (PowerShell)
- Linux/macOS
$env:AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=xxxxx;EndpointSuffix=core.windows.net"
# Run FastTransfer
.\\FastTransfer.exe `
...
--directory "abs://mystorageaccount.blob.core.windows.net/container/exports" `
--fileoutput "data.parquet" `
...
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=xxxxx;EndpointSuffix=core.windows.net"
# Run FastTransfer
./FastTransfer \
...
--directory"abs://mystorageaccount.blob.core.windows.net/mycontaner/exports" \
--fileoutput "data.parquet" \
...
3. Storage Account Key
Alternative to connection string using separate environment variables.
Get Storage Account Key:
- Azure Portal
- Azure CLI
- Navigate to your Storage Account
- Go to Security + networking → Access keys
- Copy key1 or key2
az storage account keys list \
--account-name mystorageaccount \
--resource-group myresourcegroup \
--query "[0].value" \
--output tsv
Set Environment Variables:
- Windows (PowerShell)
- Linux/macOS
$env:AZURE_CLIENT_ID="<ClientID>"
$env:AZURE_TENANT_ID="<TenantId>"
$env:AZURE_CLIENT_SECRET="<SecretID>"
# Run FastTransfer
.\\FastTransfer.exe `
...
--directory "abs://mystorageaccount.blob.core.windows.net/mycontaner/exports" `
--fileoutput "data.parquet" `
...
export AZURE_CLIENT_ID="<ClientID>"
export AZURE_TENANT_ID="<TenantId>"
export AZURE_CLIENT_SECRET="<SecretID>"
# Run FastTransfer
./FastTransfer \
...
--directory "abs://mystorageaccount.blob.core.windows.net/mycontaner/exports" \
--fileoutput "data.parquet" \
...
4. Managed Identity
When FastTransfer runs on Azure infrastructure, it can use Managed Identity automatically. No credentials configuration needed!
Supported Azure Services:
- Azure Virtual Machines
- Azure App Service
- Azure Functions
- Azure Container Instances
- Azure Kubernetes Service (AKS)
Enable System-Assigned Managed Identity:
- Azure Portal
- Azure CLI
- Navigate to your Azure resource (VM, App Service, etc.)
- Go to Identity
- Enable System assigned identity
- Save changes
# For a VM
az vm identity assign --name myVM --resource-group myResourceGroup
# For an App Service
az webapp identity assign --name myAppService --resource-group myResourceGroup
Grant Storage Access:
# Get the Managed Identity Object ID
PRINCIPAL_ID=$(az vm show --name myVM --resource-group myResourceGroup --query identity.principalId -o tsv)
# Assign Storage Blob Data Contributor role
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Storage/storageAccounts/{storage-account}"
Azure Blob Storage
URI Format
abs://storageaccount.blob.core.windows.net/container/path/
URI Examples
# Root of container
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/"
# Folder in container
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/exports"
# Nested folders
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/exports/sales/2024"
# Date partitioning
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/yearmonth=202401/"
Complete Examples
Basic Export:
- Windows
- Linux
.\\FastTransfer.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "SalesDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "Customers" `
--directory "abs://mystorageaccount.blob.core.windows.net/mycontainer/exports/customers" `
--fileoutput "customers.parquet"
./FastTransfer \
--connectiontype "mssql" \
--server "localhost" \
--database "SalesDB" \
--trusted \
--sourceschema "dbo" \
--sourcetable "Customers" \
--directory "abs://mystorageaccount.blob.core.windows.net/container/exports/customers" \
--fileoutput "customers.parquet"
Parallel Export with Query:
- Windows
- Linux
.\\FastTransfer.exe `
--connectiontype "pgsql" `
--server "localhost" `
--port "5432" `
--database "ecommerce" `
--user "postgres" `
--password "postgres" `
--query "SELECT * FROM orders WHERE order_date >= '2024-01-01'" `
--directory "abs://mystorageaccount.blob.core.windows.net/container/exports/orders" `
--fileoutput "orders_2024.parquet" `
--parallelmethod "Random" `
--distributekeycolumn "order_id" `
--paralleldegree 8
./FastTransfer \
--connectiontype "pgsql" \
--server "localhost" \
--port "5432" \
--database "ecommerce" \
--user "postgres" \
--password "postgres" \
--query "SELECT * FROM orders WHERE order_date >= '2024-01-01'" \
--directory "abs://mystorageaccount.blob.core.windows.net/container/exports/orders" \
--fileoutput "orders_2024.parquet" \
--parallelmethod "Random" \
--distributekeycolumn "order_id" \
--paralleldegree 8
Azure Data Lake Gen2
Azure Data Lake Storage Gen2 is recommended for analytics workloads due to hierarchical namespace support and better performance.
URI Format
abfss://container@storageaccount.dfs.core.windows.net/path/
- Protocol:
abfss://instead ofabs:// - Endpoint:
.dfs.core.windows.netinstead of.blob.core.windows.net - Features: Hierarchical namespace, directory-level operations, better performance
URI Examples
# Root of filesystem
--directory "abfss://mystorageaccount.dfs.core.windows.net/datalake"
# Folder in filesystem
--directory "abfss://mystorageaccount.dfs.core.windows.net/datalake/raw/sales"
# Hive-style partitioning
--directory "abfss://mystorageaccount.dfs.core.windows.net/datalake/sales/yearmonth=202401"
Complete Examples
Basic Export:
- Windows
- Linux
.\\FastTransfer.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "DataWarehouse" `
--trusted `
--sourceschema "dbo" `
--sourcetable "FactSales" `
--directory "abfss://mystorageaccount.dfs.core.windows.net/datalake/raw/{sourcetable}/" `
--fileoutput "FactSales.parquet"
./FastTransfer \
--connectiontype "mssql" \
--server "localhost" \
--database "DataWarehouse" \
--trusted \
--sourceschema "dbo" \
--sourcetable "FactSales" \
--directory "abfss://datalake@mystorageaccount.dfs.core.windows.net/raw/{sourcetable}/" \
--fileoutput "FactSales.parquet"
Parallel Export with Partitioning:
- Windows
- Linux
.\\FastTransfer.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "Analytics" `
--trusted `
--query "SELECT * FROM Events" `
--directory "abfss://mystorageaccount.dfs.core.windows.net/mycontainer/{sourcedatabase}/events/" `
--fileoutput "events.parquet" `
--parallelmethod "DataDriven" `
--distributekeycolumn "EventDate" `
--datadrivenquery "SELECT RefDate From Calendar WHERE RefDate between '2024-01-01' and '2024-03-01'" `
--paralleldegree 12
./FastTransfer \
--connectiontype "mssql" \
--server "localhost" \
--database "Analytics" \
--trusted \
--query "SELECT * FROM Events" \
--directory "abfss://mystorageaccount.dfs.core.windows.net/mycontainer/{sourcedatabase}/events/" \
--fileoutput "events.parquet" \
--parallelmethod "DataDriven" \
--distributekeycolumn "EventDate" \
--datadrivenquery "SELECT RefDate From Calendar WHERE RefDate between '2024-01-01' and '2024-03-01'" \
--paralleldegree 12
Required Permissions
Azure RBAC Roles
Recommended Role:
- Storage Blob Data Contributor - Full read/write access to blobs
Alternative Roles:
- Storage Blob Data Owner - Full access including ACL management
- Storage Blob Data Reader - Read-only access (for verification)
Assign Role via Azure Portal
- Navigate to your Storage Account
- Go to Access Control (IAM)
- Click Add → Add role assignment
- Select Storage Blob Data Contributor
- Select user, group, or managed identity
- Click Save
Assign Role via Azure CLI
# For a user
az role assignment create \
--assignee user@example.com \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Storage/storageAccounts/{storage-account}"
# For a managed identity
az role assignment create \
--assignee {managed-identity-object-id} \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Storage/storageAccounts/{storage-account}"