Cloud Storage Examples

Examples for exporting data directly to cloud storage platforms.

AWS S3

Export data directly to Amazon S3 buckets using date-based placeholders.

S3 with Date Placeholders

Automatically organize files by date using placeholder tokens:

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "SalesDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "DailySales" `
  --directory "s3://my-bucket/data/sales/full/" `
  --fileoutput "sales.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "sale_id" `
  --paralleldegree 8 `
  --runid "sales_to_s3_daily"

S3 with Nested Structure

Create hierarchical folder structures:

.\\FastTransfer.exe `
  --connectiontype "pgsql" `
  --server "localhost" `
  --port "5432" `
  --database "ecommerce" `
  --user "postgres" `
  --password "postgres" `
  --sourceschema "public" `
  --sourcetable "orders" `
  --directory "s3://analytics-bucket/exports/orders/" `
  --fileoutput "orders.parquet" `
  --parallelmethod "DataDriven" `
  --distributekeycolumn "o_orderdate" `
  --paralleldegree 10 `
  --merge "True"

S3 with CSV Format

Export CSV files to S3 with manual date partitioning:

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "LogsDB" `
  --trusted `
  --query "SELECT * FROM ApplicationLogs WHERE log_date = '2024-05-01'" `
  --directory "s3://logs-bucket/app-logs/appdate=2024-05-01/" `
  --fileoutput "app_logs.csv" `
  --decimalseparator "." `
  --delimiter "|" `
  --dateformat "yyyy-MM-dd HH:mm:ss" `
  --encoding "UTF-8" 

Azure Blob Storage

Export data to Azure Blob Storage containers.

Azure with Date Partitioning

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "AnalyticsDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "Events" `
  --directory "abs://mystorageaccount.blob.core.windows.net/datacontainer/events/full/" `
  --fileoutput "events.parquet" `
  --parallelmethod "Ntile" `
  --distributekeycolumn "event_id" `
  --paralleldegree 10

Azure Data Lake Gen2

Export to Azure Data Lake Storage Gen2:

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "AnalyticsDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "Events" `
  --directory "abfss://mystorageaccount.dfs.core.windows.net/datacontainer/events/full/" `
  --fileoutput "events.parquet" `
  --parallelmethod "Ntile" `
  --distributekeycolumn "event_id" `
  --paralleldegree 10

Hourly Exports

Create exports with hourly granularity.

Hourly Data Partition by Sensor with an incremental time range drived from outside

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "StreamingDB" `
  --trusted `
  --query "SELECT * FROM SensorData WHERE reading_time >= '2024-02-03 14:00:00' and reading_time < '2024-02-03 15:00:00'" `
  --directory "s3://iot-bucket/sensor-data/" `
  --fileoutput "sensors_20240203140000_20240203150000.parquet" `
  --parallelmethod "DataDriven" `
  --distributekeycolumn "sensor_id" `
  --datadrivenquery "SELECT sensor_id From SensorList" `
  --paralleldegree 8 `
  --runid "hourly_sensor_export_20240203140000_20240203150000"

Best for Cloud Exports

Performance Tips

Parquet

Use Parquet for cloud storage: Better compression and query performance, reduce network traffic

Parallelism

Enable parallel execution when needed:

if you need to split data on a business criteria
if you have a lots of data to extract

Dynamic Parallel Degree

Use negativ value for --paralleldegree to automatically set threads to (CPU cores / abs(degree)), leaving resources for other processes. eg --paralleldegree -2 will use half of the core of the machine from where FastTransfer is launched

DataDriven Query

**Use --datadrivenquery to :

speedup the retrieval of the elements list to extract.
as a technic for incremental extraction by using a where in the --datadrivenquery instead of the main --query

warning

Keep the degree safe for your source, to avoid a total saturation.

Example with All Best Practices

.\\FastTransfer.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "DataWarehouse" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "FactSales" `
  --directory "s3://analytics-bucket/warehouse/sales/{sourcetable}" `
  --fileoutput "FactSales.parquet" `
  --parallelmethod "DataDriven" `
  --datadrivenquery "SELECT ref_date from calendar where ref_date > cast(getdate()-30 as date)" `
  --distributekeycolumn "sale_date" `
  --paralleldegree -2 `
  --runid "daily_sales_to_s3" `
  --loglevel "Information"

tip

Build your command with the Wizard

AWS S3​

S3 with Date Placeholders​

S3 with Nested Structure​

S3 with CSV Format​

Azure Blob Storage​

Azure with Date Partitioning​

Azure Data Lake Gen2​

Hourly Exports​

Hourly Data Partition by Sensor with an incremental time range drived from outside​

Best for Cloud Exports​

Performance Tips​

Example with All Best Practices​