Skip to main content
Version: 0.15 (Latest)

JSON Import with DuckDB

FastTransfer leverages DuckDB's powerful read_json function to efficiently import JSON files into any supported database. DuckDB provides advanced features like automatic schema detection, nested structure handling, and support for both JSON objects and JSON Lines (JSONL/NDJSON) formats.

DuckDB offers several advantages for JSON processing:

  • Automatic schema detection - Automatically infers column types from JSON structure
  • High performance - Optimized JSON parser with parallel reading
  • Nested structure support - Handle complex nested objects and arrays
  • Multiple formats - Support for JSON, JSON Lines (JSONL), and NDJSON
  • Array flattening - Use unnest() to flatten nested arrays
  • In-memory processing - No need for a persistent database file

Basic JSON Import

Single JSON File

Import a single JSON file using DuckDB's in-memory mode:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\users.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "AppDB" `
--targetschema "dbo" `
--targettable "users" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
JSON Formats

DuckDB automatically detects the JSON format:

  • JSON Array: [{...}, {...}] - Array of objects
  • JSON Lines (JSONL): One JSON object per line
  • NDJSON: Newline-delimited JSON objects

JSON Lines File - Single Thread

Import a JSON Lines file to PostgreSQL:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\events.jsonl', format='newline_delimited')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "analytics" `
--targetschema "public" `
--targettable "events" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

Parallel JSON Import

DataDriven Method with Calculated Column

Use a calculated distribution key for parallel processing:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\orders.jsonl')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "sales" `
--targetschema "public" `
--targettable "orders" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "(order_id/10000)::int" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 12
Distribution Key Calculation

The expression (order_id/10000)::int divides the order_id column by 10,000 and casts to integer, creating balanced data chunks for parallel processing.

Random Method on Large JSON File

Process a large JSON Lines file with Random parallelization:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\transactions.jsonl')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "finance" `
--targetschema "dbo" `
--targettable "transactions" `
--parallelmethod "Random" `
--distributeKeyColumn "transaction_id" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 16

DataDriven Method with Custom Query

Use a custom DataDriven query to calculate optimal distribution:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\logs.jsonl') rjson" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "logging" `
--targetschema "dbo" `
--targettable "application_logs" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "(event_timestamp::date)" `
--datadrivenquery "select distinct event_timestamp::date from read_json('D:\DATA\logs.jsonl') rjson" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 10

Multiple JSON Files

Parallel Processing by Filename

Process multiple JSON files in parallel using the filename as the distribution key:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * exclude filename FROM read_json('D:\DATA\daily\*.jsonl', filename=true)" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "reports" `
--targetschema "dbo" `
--targettable "daily_metrics" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "filename" `
--datadrivenquery "select file from glob('D:\DATA\daily\*.jsonl')" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 16
Filename as Distribution Key
  • Use filename=true in read_json() to add a filename column
  • Use exclude filename to remove it from the final output
  • The glob() function returns the list of files for parallel processing
  • Each thread processes one or more complete files

Nested JSON Structures

Accessing Nested Fields

Extract specific fields from nested JSON structures:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, user.name as user_name, user.email as user_email, order_date, total_amount FROM read_json('D:\DATA\orders.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "sales" `
--targetschema "dbo" `
--targettable "orders_flat" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

Flattening Nested Arrays with UNNEST

Flatten arrays within JSON objects:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT order_id, order_date, unnest(items, recursive := true) FROM read_json('D:\DATA\orders_with_items.json')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "sales" `
--targetschema "public" `
--targettable "order_items" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

Complex Nested Structure

Handle deeply nested JSON with multiple levels:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, name, address.street as street, address.city as city, address.country as country, contacts[1] as primary_contact FROM read_json('D:\DATA\customers.json')" `
--targetconnectiontype "mysqlbulk" `
--targetserver "localhost" `
--targetdatabase "crm" `
--targetuser "crmuser" `
--targetpassword "YourPassword" `
--targetschema "crm" `
--targettable "customers" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

Advanced JSON Formats

Explicit Format Specification

Specify JSON format explicitly when needed:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\data.json', format='array', maximum_depth=10)" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "warehouse" `
--targetschema "public" `
--targettable "raw_data" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

Format Parameters:

  • format='array' - JSON array of objects
  • format='newline_delimited' - JSON Lines (JSONL/NDJSON)
  • format='auto' - Automatic detection (default)
  • maximum_depth - Maximum nesting depth to parse
  • records='true' - Parse records format
  • columns={...} - Explicit column names and types

Compressed JSON Files

Import gzip-compressed JSON files:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\logs.jsonl.gz', compression='gzip')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "logging" `
--targetschema "dbo" `
--targettable "compressed_logs" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Compression Support

DuckDB automatically detects compression from file extensions:

  • .gz - Gzip compression
  • .json.gz or .jsonl.gz - Compressed JSON files

JSON Type Conversions

Converting JSON Fields

Transform JSON data types during import:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, CAST(created_at AS TIMESTAMP) as created_at, CAST(amount AS DECIMAL(10,2)) as amount, status FROM read_json('D:\DATA\payments.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "finance" `
--targetschema "dbo" `
--targettable "payments" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"

JSON String to Array

Convert JSON string fields to arrays:

.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, name, json_extract(tags, '$') as tags_array FROM read_json('D:\DATA\products.json')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "catalog" `
--targetschema "public" `
--targettable "products" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Copyright © 2026 Architecture & Performance.