Skip to main content

Parallel Parameters

FastTransfer can parallelize data export to significantly improve performance. This section covers the parameters that control parallel execution.

Parallel Method

Use the -m or --parallelmethod parameter to specify how data will be split across parallel threads.

.\\FastTransfer.exe `
...
--parallelmethod Ntile `
...

Syntax:

  • Short form: -m <method>
  • Long form: --parallelmethod <method>

Available Methods

None - No parallel processing. Data is exported sequentially.

DataDriven - Uses all values of a column (or a given list provided by --datadrivenquery) to split the export. If the number of values is greater than the degree of parallelism, throttling will be applied.

tip

You can use an expression in the distribute key column instead of a column name.
Example: YEAR(o_orderdate)

Ntile - Uses the distributed column field and ntile values to retrieve evenly distributed chunks of data. Each parallel thread exports a portion based on a range built using the distributed column values. The uniqueness of the distributed column is not mandatory.

RangeId - Uses the distributed column field and its min/max values to retrieve chunks of data. Each parallel thread exports a portion based on a range built using the distributed column values.

Random - Requires a distribution column that must be an integer/bigint with many values (at least as many as the degree of parallelism).

Ctid - Uses an internal hidden field to retrieve chunks of data. Each parallel thread exports a portion based on the CTID range.

note

PostgreSQL databases only (and some compatible PostgreSQL databases).

Physloc - Uses an internal hidden field to retrieve chunks of data. Each parallel thread exports a portion based on the Physloc range.

note

Physloc parallel method is for SQL Server databases only.

Rowid - Uses an internal hidden field to retrieve chunks of data. Each parallel thread exports a portion based on the ROWID range.

note

RowId parallel method is for Oracle databases only.

Methods Comparison

MethodParallelNeeds Distributed ColumnDatabase Source Type
NoneAny
RandomAny
DataDrivenAny
RangeIdAny
NtileAny
CtidPostgreSQL (pgsql/pgcopy)
PhyslocSQL Server (mssql)
RowidOracle (oraodp)

Distribute Key Column

Use the -c or --distributekeycolumn parameter to define the column (or computation) on the data source that will be used to split the data into several parts.

FastTransfer will use SQL queries that run in parallel against the source. Each query will have a WHERE clause that retrieves a part of the total data.

Information

This parameter is mandatory when using methods that require a distributed column: Random, DataDriven, RangeId, or Ntile.

# Using a column name
.\\FastTransfer.exe `
...
--distributekeycolumn order_date `
...

# Using an expression
.\\FastTransfer.exe `
...
--distributekeycolumn "YEAR(order_date)" `
...

Syntax:

  • Short form: -c <column_name>
  • Long form: --distributekeycolumn <column_name>

Degree of Parallelism

Use the -p or --paralleldegree parameter to control how many parallel threads will be used for the export.

DOP Values

Positive value (e.g., 4) - Uses exactly that number of parallel threads. If greater than the number of CPU cores/threads, it will be downscaled to match available cores.

0 - Automatically aligns with the number of cores (or threads if Hyper-Threading is enabled) on the machine.

Negative value (e.g., -2) - Computed as number of cores / abs(dop). For example, if you have 16 cores and set DOP to -2, the actual DOP will be 8.

# Use 8 parallel threads
.\\FastTransfer.exe `
...
--paralleldegree 8 `
...

# Auto-detect based on CPU cores
.\\FastTransfer.exe `
...
--paralleldegree 0 `
...

# Use half of available cores
.\\FastTransfer.exe `
...
--paralleldegree -2 `
...

Syntax:

  • Short form: -p <value>
  • Long form: --paralleldegree <value>

Default: -2

Data Driven Query

Use the --datadrivenquery parameter when using the DataDriven method to provide a query that returns the list of values that will be used to split the data. This allows you to filter the values that will be exported and used to split the data.

.\\FastTransfer.exe `
...
--datadrivenquery "SELECT tagname FROM tags" `
...
.\\FastTransfer.exe `
...
--datadrivenquery "SELECT o_orderdate FROM dim_date where ref_date > getdate() - 30" `
...

Syntax:

  • Long form only: --datadrivenquery "<query>"

Merge

Use the -M or --merge flag to specify if the temporary files generated for the parallel export should be kept splitted.

Without this flag, distributed files are merge for local csv and parquet files and kept distributed for cloud destination.

warning

Current version allows valid merge for CSV and Parquet formats only.

warning

Merge is not available for cloud destinations

warning

Merge is automatic for local files destination

# Merge files after parallel export (default for local files)
.\\FastTransfer.exe `
...
--merge true `
...

# Keep distributed files
.\\FastTransfer.exe `
...
--merge false `
...

Syntax:

  • Short form: -M
  • Long form: --merge

Default: true

Complete Example

Here's a complete example using parallel parameters with the DataDriven method:

.\\FastTransfer.exe `
--connectiontype "pgcopy" `
--server "localhost:15432" `
--database "tpch" `
--user "FastUser" `
--password "FastPassword" `
--query "with T1 AS (select *, to_char(o_orderdate, 'YYYYMM') o_ordermonth from tpch_10.orders) SELECT * FROM T1" `
--directory "D:\temp\tpch\orders\" `
--fileoutput "pgcopy_orders.parquet" `
--method "DataDriven" `
--distributekeycolumn "o_ordermonth" `
--datadrivenquery "SELECT to_char(d, 'YYYYMM') AS month FROM generate_series(DATE '1998-01-01', DATE '1998-12-01',INTERVAL '1 month') AS d" `
--paralleldegree 10 `
--merge "False" `
--runid "pgcopy_to_parquet_parallel12_DataDriven"

This example:

  • Connects to a PostgreSQL database on localhost:15432
  • Exports data from the dbo.orders table
  • Uses the DataDriven distribution method
  • Distributes work based on the month of the order date using an expression
  • Use a custom query to generate month values to extract from a light query (instead of a SELECT DISTINCT o_ordermonth from the source query)
  • Uses 10 parallel threads (even if there is 12 month to export)
  • Keeps distributed files without merging (--merge false)
  • Assigns a custom Run ID for tracking
Copyright © 2026 Architecture & Performance. Built with Docusaurus.