Import JSON Files to CedarDB

    Fast, parallel file import using DuckDBStream

    FastTransfer
    Terminal
    .\FastTransfer.exe `
      --sourceconnectiontype "duckdbstream" `
      --sourceserver ":memory:" `
      --sourceserver "your-server" `
      --sourceuser "your-username" `
      --sourcepassword "your-password" `
      --query "SELECT * FROM read_json('D:\path\to\files\*.json, filename=true')" `
      --targetconnectiontype "pgcopy" `
      --targetserver "your-server" `
      --targetuser "your-username" `
      --targetpassword "your-password" `
      --targetdatabase "your-database" `
      --targetschema "your-schema" `
      --targettable "your-table" `
      --method "DataDriven" `
      --distributekeycolumn "filename"  `
      --datadrivenquery "select file from glob('D:\path\to\files\*.json')"  `
      --degree -2  `
      --loadmode "Truncate"  `
      --mapmethod "Name"
    Get FastTransfer

    Source - JSON (JavaScript Object Notation)

    JSON is the most widely used data exchange format on the web. FastTransfer parses JSON files efficiently and can extract nested structures.

    Features:

    • JSON Lines (NDJSON) support
    • Automatic structure flattening with DuckDB read_json() syntax
    • Automatic schema detection
    • Parallel processing of multiple files

    Processing - DuckDBStream with DataDriven

    DuckDB is a fast and efficient in-process analytical database. FastTransfer uses DuckDBStream to read multiple file formats with exceptional performance.

    Parallel Method: DataDriven (Files)

    For files, FastTransfer uses the filename as distribution key to parallelize the processing of multiple files simultaneously.

    • Concurrent processing of multiple files
    • Ideal for batch imports
    • Automatic horizontal scaling

    Destination - CedarDB

    FastTransfer uses PostgreSQL's binary COPY protocol for CedarDB with a PostgreSQL Compatible Source if you use pgcopy both in source and target connection types, ensuring maximum compatibility and performance.

    Loading method:

    Binary COPY Protocol

    Advantages:

    • Binary COPY for maximum performance (Pg Compatible Source Only + pgcopy/pgcopy)
    • PostgreSQL protocol compatibility
    • Optimized for modern hardware