Import XLSX Files to Citus Data
Fast, parallel file import using DuckDBStream

Terminal
.\FastTransfer.exe `
--sourceconnectiontype "duckdbstream" `
--sourceserver ":memory:" `
--sourceserver "your-server" `
--sourceuser "your-username" `
--sourcepassword "your-password" `
--query "SELECT * FROM read_xlsx('D:\path\to\files\*.xlsx, filename=true')" `
--targetconnectiontype "pgcopy" `
--targetserver "your-server" `
--targetuser "your-username" `
--targetpassword "your-password" `
--targetdatabase "your-database" `
--targetschema "your-schema" `
--targettable "your-table" `
--method "DataDriven" `
--distributekeycolumn "filename" `
--datadrivenquery "select file from glob('D:\path\to\files\*.xlsx')" `
--degree -2 `
--loadmode "Truncate" `
--mapmethod "Name"Source - Excel (XLSX)
The Excel XLSX format is ubiquitous in enterprise environments. FastTransfer can directly read Excel files without prior conversion.
Features:
- •Direct reading without Excel installed with DuckDB read_xlsx() syntax
- •Support for multiple sheets
- •Automatic header detection
- •Data type preservation
Processing - DuckDBStream with DataDriven
DuckDB is a fast and efficient in-process analytical database. FastTransfer uses DuckDBStream to read multiple file formats with exceptional performance.
Parallel Method: DataDriven (Files)
For files, FastTransfer uses the filename as distribution key to parallelize the processing of multiple files simultaneously.
- ✓Concurrent processing of multiple files
- ✓Ideal for batch imports
- ✓Automatic horizontal scaling
Destination - Citus Data
FastTransfer uses PostgreSQL's binary COPY protocol for Citus with a PostgreSQL Compatible Source if you use pgcopy both in source and target connection types.
Loading method:
Binary COPY Protocol (Distributed)
Advantages:
- •Binary COPY for maximum performance (Pg Compatible Source Only + pgcopy/pgcopy)
- •Automatic distribution across shards
- •Optimized for distributed tables

