JSON Import with DuckDB
FastTransfer leverages DuckDB's powerful read_json function to efficiently import JSON files into any supported database. DuckDB provides advanced features like automatic schema detection, nested structure handling, and support for both JSON objects and JSON Lines (JSONL/NDJSON) formats.
DuckDB offers several advantages for JSON processing:
- Automatic schema detection - Automatically infers column types from JSON structure
- High performance - Optimized JSON parser with parallel reading
- Nested structure support - Handle complex nested objects and arrays
- Multiple formats - Support for JSON, JSON Lines (JSONL), and NDJSON
- Array flattening - Use
unnest()to flatten nested arrays - In-memory processing - No need for a persistent database file
Basic JSON Import
Single JSON File
Import a single JSON file using DuckDB's in-memory mode:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\users.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "AppDB" `
--targetschema "dbo" `
--targettable "users" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
DuckDB automatically detects the JSON format:
- JSON Array:
[{...}, {...}]- Array of objects - JSON Lines (JSONL): One JSON object per line
- NDJSON: Newline-delimited JSON objects
JSON Lines File - Single Thread
Import a JSON Lines file to PostgreSQL:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\events.jsonl', format='newline_delimited')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "analytics" `
--targetschema "public" `
--targettable "events" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Parallel JSON Import
DataDriven Method with Calculated Column
Use a calculated distribution key for parallel processing:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\orders.jsonl')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "sales" `
--targetschema "public" `
--targettable "orders" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "(order_id/10000)::int" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 12
The expression (order_id/10000)::int divides the order_id column by 10,000 and casts to integer, creating balanced data chunks for parallel processing.
Random Method on Large JSON File
Process a large JSON Lines file with Random parallelization:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\transactions.jsonl')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "finance" `
--targetschema "dbo" `
--targettable "transactions" `
--parallelmethod "Random" `
--distributeKeyColumn "transaction_id" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 16
DataDriven Method with Custom Query
Use a custom DataDriven query to calculate optimal distribution:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\logs.jsonl') rjson" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "logging" `
--targetschema "dbo" `
--targettable "application_logs" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "(event_timestamp::date)" `
--datadrivenquery "select distinct event_timestamp::date from read_json('D:\DATA\logs.jsonl') rjson" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 10
Multiple JSON Files
Parallel Processing by Filename
Process multiple JSON files in parallel using the filename as the distribution key:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * exclude filename FROM read_json('D:\DATA\daily\*.jsonl', filename=true)" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "reports" `
--targetschema "dbo" `
--targettable "daily_metrics" `
--parallelmethod "DataDriven" `
--distributeKeyColumn "filename" `
--datadrivenquery "select file from glob('D:\DATA\daily\*.jsonl')" `
--mapmethod "Name" `
--loadmode "Truncate" `
--degree 16
- Use
filename=trueinread_json()to add a filename column - Use
exclude filenameto remove it from the final output - The
glob()function returns the list of files for parallel processing - Each thread processes one or more complete files
Nested JSON Structures
Accessing Nested Fields
Extract specific fields from nested JSON structures:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, user.name as user_name, user.email as user_email, order_date, total_amount FROM read_json('D:\DATA\orders.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "sales" `
--targetschema "dbo" `
--targettable "orders_flat" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Flattening Nested Arrays with UNNEST
Flatten arrays within JSON objects:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT order_id, order_date, unnest(items, recursive := true) FROM read_json('D:\DATA\orders_with_items.json')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "sales" `
--targetschema "public" `
--targettable "order_items" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Complex Nested Structure
Handle deeply nested JSON with multiple levels:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, name, address.street as street, address.city as city, address.country as country, contacts[1] as primary_contact FROM read_json('D:\DATA\customers.json')" `
--targetconnectiontype "mysqlbulk" `
--targetserver "localhost" `
--targetdatabase "crm" `
--targetuser "crmuser" `
--targetpassword "YourPassword" `
--targetschema "crm" `
--targettable "customers" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Advanced JSON Formats
Explicit Format Specification
Specify JSON format explicitly when needed:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * FROM read_json('D:\DATA\data.json', format='array', maximum_depth=10)" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "warehouse" `
--targetschema "public" `
--targettable "raw_data" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
Format Parameters:
format='array'- JSON array of objectsformat='newline_delimited'- JSON Lines (JSONL/NDJSON)format='auto'- Automatic detection (default)maximum_depth- Maximum nesting depth to parserecords='true'- Parse records formatcolumns={...}- Explicit column names and types
Compressed JSON Files
Import gzip-compressed JSON files:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT * from read_json('D:\DATA\logs.jsonl.gz', compression='gzip')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "logging" `
--targetschema "dbo" `
--targettable "compressed_logs" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
DuckDB automatically detects compression from file extensions:
.gz- Gzip compression.json.gzor.jsonl.gz- Compressed JSON files
JSON Type Conversions
Converting JSON Fields
Transform JSON data types during import:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, CAST(created_at AS TIMESTAMP) as created_at, CAST(amount AS DECIMAL(10,2)) as amount, status FROM read_json('D:\DATA\payments.json')" `
--targetconnectiontype "msbulk" `
--targetserver "localhost" `
--targettrusted `
--targetdatabase "finance" `
--targetschema "dbo" `
--targettable "payments" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"
JSON String to Array
Convert JSON string fields to arrays:
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--query "SELECT id, name, json_extract(tags, '$') as tags_array FROM read_json('D:\DATA\products.json')" `
--targetconnectiontype "pgcopy" `
--targetserver "localhost:5432" `
--targetuser "postgres" `
--targetpassword "YourPassword" `
--targetdatabase "catalog" `
--targetschema "public" `
--targettable "products" `
--parallelmethod "None" `
--mapmethod "Name" `
--loadmode "Truncate"