Version: 0.15 (Latest)

JSON Import with DuckDB

FastTransfer leverages DuckDB's powerful read_json function to efficiently import JSON files into any supported database. DuckDB provides advanced features like automatic schema detection, nested structure handling, and support for both JSON objects and JSON Lines (JSONL/NDJSON) formats.

DuckDB offers several advantages for JSON processing:

Automatic schema detection - Automatically infers column types from JSON structure
High performance - Optimized JSON parser with parallel reading
Nested structure support - Handle complex nested objects and arrays
Multiple formats - Support for JSON, JSON Lines (JSONL), and NDJSON
Array flattening - Use unnest() to flatten nested arrays
In-memory processing - No need for a persistent database file

Basic JSON Import

Single JSON File

Import a single JSON file using DuckDB's in-memory mode:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * from read_json('D:\DATA\users.json')" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "AppDB" `
  --targetschema "dbo" `
  --targettable "users" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

JSON Formats

DuckDB automatically detects the JSON format:

JSON Array: [{...}, {...}] - Array of objects
JSON Lines (JSONL): One JSON object per line
NDJSON: Newline-delimited JSON objects

JSON Lines File - Single Thread

Import a JSON Lines file to PostgreSQL:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * from read_json('D:\DATA\events.jsonl', format='newline_delimited')" `
  --targetconnectiontype "pgcopy" `
  --targetserver "localhost:5432" `
  --targetuser "postgres" `
  --targetpassword "YourPassword" `
  --targetdatabase "analytics" `
  --targetschema "public" `
  --targettable "events" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Parallel JSON Import

DataDriven Method with Calculated Column

Use a calculated distribution key for parallel processing:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * from read_json('D:\DATA\orders.jsonl')" `
  --targetconnectiontype "pgcopy" `
  --targetserver "localhost:5432" `
  --targetuser "postgres" `
  --targetpassword "YourPassword" `
  --targetdatabase "sales" `
  --targetschema "public" `
  --targettable "orders" `
  --parallelmethod "DataDriven" `
  --distributeKeyColumn "(order_id/10000)::int" `
  --mapmethod "Name" `
  --loadmode "Truncate" `
  --degree 12

Distribution Key Calculation

The expression (order_id/10000)::int divides the order_id column by 10,000 and casts to integer, creating balanced data chunks for parallel processing.

Random Method on Large JSON File

Process a large JSON Lines file with Random parallelization:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * FROM read_json('D:\DATA\transactions.jsonl')" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "finance" `
  --targetschema "dbo" `
  --targettable "transactions" `
  --parallelmethod "Random" `
  --distributeKeyColumn "transaction_id" `
  --mapmethod "Name" `
  --loadmode "Truncate" `
  --degree 16

DataDriven Method with Custom Query

Use a custom DataDriven query to calculate optimal distribution:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * FROM read_json('D:\DATA\logs.jsonl') rjson" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "logging" `
  --targetschema "dbo" `
  --targettable "application_logs" `
  --parallelmethod "DataDriven" `
  --distributeKeyColumn "(event_timestamp::date)" `
  --datadrivenquery "select distinct event_timestamp::date from read_json('D:\DATA\logs.jsonl') rjson" `
  --mapmethod "Name" `
  --loadmode "Truncate" `
  --degree 10

Multiple JSON Files

Parallel Processing by Filename

Process multiple JSON files in parallel using the filename as the distribution key:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * exclude filename FROM read_json('D:\DATA\daily\*.jsonl', filename=true)" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "reports" `
  --targetschema "dbo" `
  --targettable "daily_metrics" `
  --parallelmethod "DataDriven" `
  --distributeKeyColumn "filename" `
  --datadrivenquery "select file from glob('D:\DATA\daily\*.jsonl')" `
  --mapmethod "Name" `
  --loadmode "Truncate" `
  --degree 16

Filename as Distribution Key

Use filename=true in read_json() to add a filename column
Use exclude filename to remove it from the final output
The glob() function returns the list of files for parallel processing
Each thread processes one or more complete files

Nested JSON Structures

Accessing Nested Fields

Extract specific fields from nested JSON structures:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT id, user.name as user_name, user.email as user_email, order_date, total_amount FROM read_json('D:\DATA\orders.json')" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "sales" `
  --targetschema "dbo" `
  --targettable "orders_flat" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Flattening Nested Arrays with UNNEST

Flatten arrays within JSON objects:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT order_id, order_date, unnest(items, recursive := true) FROM read_json('D:\DATA\orders_with_items.json')" `
  --targetconnectiontype "pgcopy" `
  --targetserver "localhost:5432" `
  --targetuser "postgres" `
  --targetpassword "YourPassword" `
  --targetdatabase "sales" `
  --targetschema "public" `
  --targettable "order_items" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Complex Nested Structure

Handle deeply nested JSON with multiple levels:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT id, name, address.street as street, address.city as city, address.country as country, contacts[1] as primary_contact FROM read_json('D:\DATA\customers.json')" `
  --targetconnectiontype "mysqlbulk" `
  --targetserver "localhost" `
  --targetdatabase "crm" `
  --targetuser "crmuser" `
  --targetpassword "YourPassword" `
  --targetschema "crm" `
  --targettable "customers" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Advanced JSON Formats

Explicit Format Specification

Specify JSON format explicitly when needed:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * FROM read_json('D:\DATA\data.json', format='array', maximum_depth=10)" `
  --targetconnectiontype "pgcopy" `
  --targetserver "localhost:5432" `
  --targetuser "postgres" `
  --targetpassword "YourPassword" `
  --targetdatabase "warehouse" `
  --targetschema "public" `
  --targettable "raw_data" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Format Parameters:

format='array' - JSON array of objects
format='newline_delimited' - JSON Lines (JSONL/NDJSON)
format='auto' - Automatic detection (default)
maximum_depth - Maximum nesting depth to parse
records='true' - Parse records format
columns={...} - Explicit column names and types

Compressed JSON Files

Import gzip-compressed JSON files:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT * from read_json('D:\DATA\logs.jsonl.gz', compression='gzip')" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "logging" `
  --targetschema "dbo" `
  --targettable "compressed_logs" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Compression Support

DuckDB automatically detects compression from file extensions:

.gz - Gzip compression
.json.gz or .jsonl.gz - Compressed JSON files

JSON Type Conversions

Converting JSON Fields

Transform JSON data types during import:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT id, CAST(created_at AS TIMESTAMP) as created_at, CAST(amount AS DECIMAL(10,2)) as amount, status FROM read_json('D:\DATA\payments.json')" `
  --targetconnectiontype "msbulk" `
  --targetserver "localhost" `
  --targettrusted `
  --targetdatabase "finance" `
  --targetschema "dbo" `
  --targettable "payments" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

JSON String to Array

Convert JSON string fields to arrays:

.\FastTransfer.exe `
  --sourceconnectiontype "duckdbstream" `
  --sourceserver ":memory:" `
  --query "SELECT id, name, json_extract(tags, '$') as tags_array FROM read_json('D:\DATA\products.json')" `
  --targetconnectiontype "pgcopy" `
  --targetserver "localhost:5432" `
  --targetuser "postgres" `
  --targetpassword "YourPassword" `
  --targetdatabase "catalog" `
  --targetschema "public" `
  --targettable "products" `
  --parallelmethod "None" `
  --mapmethod "Name" `
  --loadmode "Truncate"

Basic JSON Import​

Single JSON File​

JSON Lines File - Single Thread​

Parallel JSON Import​

DataDriven Method with Calculated Column​

Random Method on Large JSON File​

DataDriven Method with Custom Query​

Multiple JSON Files​

Parallel Processing by Filename​

Nested JSON Structures​

Accessing Nested Fields​

Flattening Nested Arrays with UNNEST​

Complex Nested Structure​

Advanced JSON Formats​

Explicit Format Specification​

Compressed JSON Files​

JSON Type Conversions​

Converting JSON Fields​

JSON String to Array​