Import JSON Files to Citus Data
Fast, parallel file import using DuckDBStream

Terminal
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--sourceserver "your-server" `
--sourceuser "your-username" `
--sourcepassword "your-password" `
--query "SELECT * FROM read_json('D:\path\to\files\*.json, filename=true')" `
--targetconnectiontype "pgcopy" `
--targetserver "your-server" `
--targetuser "your-username" `
--targetpassword "your-password" `
--targetdatabase "your-database" `
--targetschema "your-schema" `
--targettable "your-table" `
--method "DataDriven" `
--distributekeycolumn "filename" `
--datadrivenquery "select file from glob('D:\path\to\files\*.json')" `
--degree -2 `
--loadmode "Truncate" `
--mapmethod "Name"Source - JSON (JavaScript Object Notation)
JSON is the most widely used data exchange format on the web. FastTransfer parses JSON files efficiently and can extract nested structures.
Features:
- •JSON Lines (NDJSON) support
- •Automatic structure flattening with DuckDB read_json() syntax
- •Automatic schema detection
- •Parallel processing of multiple files
Processing - DuckDBStream with DataDriven
DuckDB is a fast and efficient in-process analytical database. FastTransfer uses DuckDBStream to read multiple file formats with exceptional performance.
Parallel Method: DataDriven (Files)
For files, FastTransfer uses the filename as distribution key to parallelize the processing of multiple files simultaneously.
- ✓Concurrent processing of multiple files
- ✓Ideal for batch imports
- ✓Automatic horizontal scaling
Destination - Citus Data
FastTransfer uses PostgreSQL's binary COPY protocol for Citus with a PostgreSQL Compatible Source if you use pgcopy both in source and target connection types.
Loading method:
Binary COPY Protocol (Distributed)
Advantages:
- •Binary COPY for maximum performance (Pg Compatible Source Only + pgcopy/pgcopy)
- •Automatic distribution across shards
- •Optimized for distributed tables
