Import Parquet Files to CedarDB

    Fast, parallel file import using DuckDBStream

    FastTransfer
    Terminal
    .\FastTransfer.exe `
      --sourceconnectiontype "duckdbstream" `
      --sourceserver ":memory:" `
      --sourceserver "your-server" `
      --sourceuser "your-username" `
      --sourcepassword "your-password" `
      --query "SELECT * FROM read_parquet('D:\path\to\files\*.parquet, filename=true')" `
      --targetconnectiontype "pgcopy" `
      --targetserver "your-server" `
      --targetuser "your-username" `
      --targetpassword "your-password" `
      --targetdatabase "your-database" `
      --targetschema "your-schema" `
      --targettable "your-table" `
      --method "DataDriven" `
      --distributekeycolumn "filename"  `
      --datadrivenquery "select file from glob('D:\path\to\files\*.parquet')"  `
      --degree -2  `
      --loadmode "Truncate"  `
      --mapmethod "Name"
    Get FastTransfer

    Source - Apache Parquet

    Parquet is a columnar file format optimized for analytical processing. FastTransfer reads Parquet via DuckDB with exceptional native performance.

    Features:

    • Ultra-fast columnar reading
    • Integrated native compression
    • Data type preservation
    • Pushdown filtering for optimal performance

    Processing - DuckDBStream with DataDriven

    DuckDB is a fast and efficient in-process analytical database. FastTransfer uses DuckDBStream to read multiple file formats with exceptional performance.

    Parallel Method: DataDriven (Files)

    For files, FastTransfer uses the filename as distribution key to parallelize the processing of multiple files simultaneously.

    • Concurrent processing of multiple files
    • Ideal for batch imports
    • Automatic horizontal scaling

    Destination - CedarDB

    FastTransfer uses PostgreSQL's binary COPY protocol for CedarDB with a PostgreSQL Compatible Source if you use pgcopy both in source and target connection types, ensuring maximum compatibility and performance.

    Loading method:

    Binary COPY Protocol

    Advantages:

    • Binary COPY for maximum performance (Pg Compatible Source Only + pgcopy/pgcopy)
    • PostgreSQL protocol compatibility
    • Optimized for modern hardware