Version: 0.14

Parallel Parameters

FastTransfer can parallelize data export to significantly improve performance. This section covers the parameters that control parallel execution.

Parallel Method

Use the -M or --parallelmethod parameter to specify how data will be split across parallel threads.

Windows
Linux

.\FastTransfer.exe `
...
 --parallelmethod Ntile `
...

./FastTransfer \ 
...
 --parallelmethod Ntile \
...

Syntax:

Short form: -m <method>
Long form: --parallelmethod <method>

Available Methods

None - No parallelism.

DataDriven - Use the distinct value of the distributeKeyColumn (which can be a column or an expression) to distribute data.

tip

You can use an expression in the distribute key column instead of a column name.
Example: YEAR(o_orderdate)

Ctid - Recommended and exclusive for PostgreSQL and PostgreSQL compatible databases. Use the Ctid pseudo column (for pgsql and pgcopy source only).

note

Ctid parallel method is for PostgreSQL databases only.

Physloc - Uses an internal hidden field to retrieve chunks of data. Each parallel thread exports a portion based on the Physloc range.

note

Physloc parallel method is for SQL Server databases only.

Random - Use a modulo on the distributeKeyColumn to distribute data.

Rowid - For Oracle sources only: use rowid slices.

note

Rowid parallel method is for Oracle databases only.

RangeId - Use a numeric range to distribute data (useful with an identity column or sequence without gaps. The column must be numerical).

Ntile - Use the ntile function to distribute data evenly. Data can be numerical, date, datetime or string.

NZDataSlice - Netezza source only. Use the data slices to distribute data retrieval.

note

NZDataSlice parallel method is for Netezza databases only.

Methods Comparison

Method	Parallel	Needs Distributed Column	Database Source Type
`None`			Any
`DataDriven`			Any
`Ctid`			PostgreSQL (pgsql/pgcopy)
`Physloc`			SQL Server (mssql)
`Random`			Any
`Rowid`			Oracle (oraodp)
`RangeId`			Any
`Ntile`			Any
`NZDataSlice`			Netezza

Default Value: None

When to Use Parallelism

If you want to use a parallel export/import use other than None. Try to use Parallelism only if you have a large amount of data to transfer (more than 1M cells).

Distribute Key Column

Use the -K or --distributekeycolumn parameter to define the column (or computation) on the data source that will be used to split the data into several parts.

FastTransfer will use SQL queries that run in parallel against the source. Each query will have a WHERE clause that retrieves a part of the total data.

Information

This parameter is mandatory when using methods that require a distributed column: Random, DataDriven, RangeId, or Ntile.

Windows
Linux

# Using a column name
.\FastTransfer.exe `
...
 --distributekeycolumn order_date `
...

# Using an expression
.\FastTransfer.exe `
...
 --distributekeycolumn "YEAR(order_date)" `
...

# Using a column name
./FastTransfer \
...
 --distributekeycolumn order_date \
...

# Using an expression
./FastTransfer \
...
 --distributekeycolumn "YEAR(order_date)" \
...

Syntax:

Short form: -K <column_name>
Long form: --distributekeycolumn <column_name>

Degree of Parallelism

Use the -P or --degree parameter to control how many parallel threads will be used for the export.

DOP Values

Positive value (e.g., 4) - Uses exactly that number of parallel threads. If greater than the number of CPU cores/threads, it will be downscaled to match available cores.

0 - Automatically aligns with the number of cores (or threads if Hyper-Threading is enabled) on the machine.

Negative value (e.g., -2) - Computed as number of cores / abs(dop). For example, if you have 16 cores and set DOP to -2, the actual DOP will be 8.

Windows
Linux

# Use 8 parallel threads
.\FastTransfer.exe `
...
 --degree 8 `
...

# Auto-detect based on CPU cores
.\FastTransfer.exe `
...
 --degree 0 `
... 

# Use half of available cores
.\FastTransfer.exe `
...
 --degree -2 `
...

# Use 8 parallel threads
./FastTransfer \
...
 --degree 8 \
...

# Auto-detect based on CPU cores
./FastTransfer \
...
 --degree 0 \
...

# Use half of available cores
./FastTransfer \ 
...
 --degree -2 \
...

Syntax:

Short form: -P <value>
Long form: --degree <value>

Default: -2

Data Driven Query

Use the -Q or --datadrivenquery parameter when using the DataDriven method to provide a query that returns the list of values that will be used to split the data. This allows you to filter the values that will be exported and used to split the data.

Windows
Linux

.\FastTransfer.exe `
...
 --datadrivenquery "SELECT tagname FROM tags" `
...

./FastTransfer \
...
 --datadrivenquery "SELECT tagname FROM tags" \
...

Windows
Linux

.\FastTransfer.exe `
...
 --datadrivenquery "SELECT o_orderdate FROM dim_date where ref_date > getdate() - 30" `
...

./FastTransfer \
...
 --datadrivenquery "SELECT o_orderdate FROM dim_date where ref_date > getdate() - 30" \
...

Syntax:

Short form: -Q "<value>"
Long form only: --datadrivenquery "<query>"

Complete Example

Here's a complete example using parallel parameters with the DataDriven method:

Windows
Linux

.\FastTransfer.exe `
 --sourceconnectiontype "mssql" `
 --sourceserver "localhost" `
 --sourcedatabase "tpch" `
 --sourcetrusted `
 --targetconnectiontype "msbulk" `
 --targetserver "localhost" `
 --targetdatabase "tpch_copy" `
 --targettrusted `
 --targetschema "dbo" `
 --targettable "orders" `
 --query "select * from orders" `
 --parallelmethod "DataDriven" `
 --distributekeycolumn "o_ordermonth" `
 --datadrivenquery "SELECT to_char(d, 'YYYYMM') AS month FROM generate_series(DATE '1998-01-01', DATE '1998-12-01',INTERVAL '1 month') AS d" `
 --degree 10

./FastTransfer \
 --sourceconnectiontype "mssql" \
 --sourceserver "localhost" \
 --sourcedatabase "tpch" \
 --sourcetrusted \
 --targetconnectiontype "msbulk" \
 --targetserver "localhost" \
 --targetdatabase "tpch_copy" \
 --targettrusted \
 --targetschema "dbo" \
 --targettable "orders" \
 --query "select * from orders" \
 --parallelmethod "DataDriven" \
 --distributekeycolumn "o_ordermonth" \
 --datadrivenquery "SELECT to_char(d, 'YYYYMM') AS month FROM generate_series(DATE '1998-01-01', DATE '1998-12-01',INTERVAL '1 month') AS d" \
 --degree 10

This example:

Connects to a SQL Server source database on localhost
Transfers data to a SQL Server target database using BulkCopy
Uses trusted authentication for both source and target
Executes a custom query on the orders table
Uses the DataDriven distribution method
Distributes work based on the month of the order date (o_ordermonth column)
Uses a custom query to generate month values (instead of a SELECT DISTINCT o_ordermonth from the source query)
Uses 10 parallel threads (even though there are 12 months to export)

Parallel Method​

Available Methods​

Methods Comparison​

Distribute Key Column​

Degree of Parallelism​

DOP Values​

Data Driven Query​

Complete Example​

Parallel Method

Available Methods

Methods Comparison

Distribute Key Column

Degree of Parallelism

DOP Values

Data Driven Query

Complete Example