Parallel Parameters
FastTransfer can parallelize data export to significantly improve performance. This section covers the parameters that control parallel execution.
Parallel Method
Use the -M or --parallelmethod parameter to specify how data will be split across parallel threads.
- Windows
- Linux
.\FastTransfer.exe `
...
--parallelmethod Ntile `
...
./FastTransfer \
...
--parallelmethod Ntile \
...
Syntax:
- Short form:
-m <method> - Long form:
--parallelmethod <method>
Available Methods
None - No parallelism.
DataDriven - Use the distinct value of the distributeKeyColumn (which can be a column or an expression) to distribute data.
You can use an expression in the distribute key column instead of a column name.
Example: YEAR(o_orderdate)
Ctid - Recommended and exclusive for PostgreSQL and PostgreSQL compatible databases. Use the Ctid pseudo column (for pgsql and pgcopy source only).
Ctid parallel method is for PostgreSQL databases only.
Physloc - Uses an internal hidden field to retrieve chunks of data. Each parallel thread exports a portion based on the Physloc range.
Physloc parallel method is for SQL Server databases only.
Random - Use a modulo on the distributeKeyColumn to distribute data.
Rowid - For Oracle sources only: use rowid slices.
Rowid parallel method is for Oracle databases only.
RangeId - Use a numeric range to distribute data (useful with an identity column or sequence without gaps. The column must be numerical).
Ntile - Use the ntile function to distribute data evenly. Data can be numerical, date, datetime or string.
NZDataSlice - Netezza source only. Use the data slices to distribute data retrieval.
NZDataSlice parallel method is for Netezza databases only.
Methods Comparison
| Method | Parallel | Needs Distributed Column | Database Source Type |
|---|---|---|---|
None | Any | ||
DataDriven | Any | ||
Ctid | PostgreSQL (pgsql/pgcopy) | ||
Physloc | SQL Server (mssql) | ||
Random | Any | ||
Rowid | Oracle (oraodp) | ||
RangeId | Any | ||
Ntile | Any | ||
NZDataSlice | Netezza |
Default Value: None
If you want to use a parallel export/import use other than None. Try to use Parallelism only if you have a large amount of data to transfer (more than 1M cells).
Distribute Key Column
Use the -K or --distributekeycolumn parameter to define the column (or computation) on the data source that will be used to split the data into several parts.
FastTransfer will use SQL queries that run in parallel against the source. Each query will have a WHERE clause that retrieves a part of the total data.
This parameter is mandatory when using methods that require a distributed column: Random, DataDriven, RangeId, or Ntile.
- Windows
- Linux
# Using a column name
.\FastTransfer.exe `
...
--distributekeycolumn order_date `
...
# Using an expression
.\FastTransfer.exe `
...
--distributekeycolumn "YEAR(order_date)" `
...
# Using a column name
./FastTransfer \
...
--distributekeycolumn order_date \
...
# Using an expression
./FastTransfer \
...
--distributekeycolumn "YEAR(order_date)" \
...
Syntax:
- Short form:
-K <column_name> - Long form:
--distributekeycolumn <column_name>
Degree of Parallelism
Use the -P or --degree parameter to control how many parallel threads will be used for the export.
DOP Values
Positive value (e.g., 4) - Uses exactly that number of parallel threads. If greater than the number of CPU cores/threads, it will be downscaled to match available cores.
0 - Automatically aligns with the number of cores (or threads if Hyper-Threading is enabled) on the machine.
Negative value (e.g., -2) - Computed as number of cores / abs(dop). For example, if you have 16 cores and set DOP to -2, the actual DOP will be 8.
- Windows
- Linux
# Use 8 parallel threads
.\FastTransfer.exe `
...
--degree 8 `
...
# Auto-detect based on CPU cores
.\FastTransfer.exe `
...
--degree 0 `
...
# Use half of available cores
.\FastTransfer.exe `
...
--degree -2 `
...
# Use 8 parallel threads
./FastTransfer \
...
--degree 8 \
...
# Auto-detect based on CPU cores
./FastTransfer \
...
--degree 0 \
...
# Use half of available cores
./FastTransfer \
...
--degree -2 \
...
Syntax:
- Short form:
-P <value> - Long form:
--degree <value>
Default: -2
Data Driven Query
Use the -Q or --datadrivenquery parameter when using the DataDriven method to provide a query that returns the list of values that will be used to split the data. This allows you to filter the values that will be exported and used to split the data.
- Windows
- Linux
.\FastTransfer.exe `
...
--datadrivenquery "SELECT tagname FROM tags" `
...
./FastTransfer \
...
--datadrivenquery "SELECT tagname FROM tags" \
...
- Windows
- Linux
.\FastTransfer.exe `
...
--datadrivenquery "SELECT o_orderdate FROM dim_date where ref_date > getdate() - 30" `
...
./FastTransfer \
...
--datadrivenquery "SELECT o_orderdate FROM dim_date where ref_date > getdate() - 30" \
...
Syntax:
- Short form:
-Q "<value>" - Long form only:
--datadrivenquery "<query>"
Complete Example
Here's a complete example using parallel parameters with the DataDriven method:
- Windows
- Linux
.\FastTransfer.exe `
--sourceconnectiontype "mssql" `
--sourceserver "localhost" `
--sourcedatabase "tpch" `
--sourcetrusted `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targetdatabase "tpch_copy" `
--targettrusted `
--targetschema "dbo" `
--targettable "orders" `
--query "select * from orders" `
--parallelmethod "DataDriven" `
--distributekeycolumn "o_ordermonth" `
--datadrivenquery "SELECT to_char(d, 'YYYYMM') AS month FROM generate_series(DATE '1998-01-01', DATE '1998-12-01',INTERVAL '1 month') AS d" `
--degree 10
./FastTransfer \
--sourceconnectiontype "mssql" \
--sourceserver "localhost" \
--sourcedatabase "tpch" \
--sourcetrusted \
--targetconnectiontype "msbulk" \
--targetserver "localhost" \
--targetdatabase "tpch_copy" \
--targettrusted \
--targetschema "dbo" \
--targettable "orders" \
--query "select * from orders" \
--parallelmethod "DataDriven" \
--distributekeycolumn "o_ordermonth" \
--datadrivenquery "SELECT to_char(d, 'YYYYMM') AS month FROM generate_series(DATE '1998-01-01', DATE '1998-12-01',INTERVAL '1 month') AS d" \
--degree 10
This example:
- Connects to a SQL Server source database on localhost
- Transfers data to a SQL Server target database using BulkCopy
- Uses trusted authentication for both source and target
- Executes a custom query on the orders table
- Uses the DataDriven distribution method
- Distributes work based on the month of the order date (o_ordermonth column)
- Uses a custom query to generate month values (instead of a
SELECT DISTINCT o_ordermonthfrom the source query) - Uses 10 parallel threads (even though there are 12 months to export)