Skip to content

PostgreSQL Connector

The PostgreSQL connector supports reading from and writing to PostgreSQL databases using the psycopg 3.x driver.


Supported Versions

PostgreSQL 9.6 through 17.

Driver

psycopg 3.x (psycopg[binary]>=3.1). Uses the binary build for faster installation.


Connection Configuration

<source connector="postgresql">
  <connection host="localhost"
              port="5432"
              database="analytics"
              username="${env:PG_USER}"
              password="${env:PG_PASS}" />
</source>
Parameter Default Description
host Required Database hostname or IP.
port 5432 Database port.
database Required Database name.
username -- Credential via ${env:VAR}.
password -- Credential via ${env:VAR}.

SSL / TLS

When encrypt="true" is set on the <connection> element, the connector uses sslmode=prefer (try SSL, fall back to plaintext). When not set, sslmode=disable is used.

TCP Keepalive

The connector enables TCP keepalive by default to prevent idle connection timeouts during long target-side writes:

  • keepalives=1
  • keepalives_idle=30 seconds
  • keepalives_interval=10 seconds
  • keepalives_count=5

Performance Features

COPY Binary Protocol (Writes)

The PostgreSQLDataWriter uses the psycopg COPY protocol with write_row() for high-throughput bulk inserts. This is significantly faster than row-by-row INSERT statements.

Streaming Cursor (Reads)

Reads use server-side cursors to stream data without loading the entire result set into memory. Data is fetched in batches of batch_size rows and converted to Arrow RecordBatches.

Push-Down CAST

For jsonb, json, and uuid columns, the data reader issues ::text casts in the SELECT statement. This pushes string conversion to the database server, eliminating Python-level json.dumps() and str() calls in the hot path.

Vectorized Column Extraction

The data writer uses Arrow's to_pylist() for C-level column extraction instead of per-cell scalar.as_py() loops.


Default Schema

The default schema is public.


Type Mapping

See Type Mappings > PostgreSQL for the complete source and sink mapping tables.

Key points:

  • serial / bigserial are used for auto-increment columns (bigserial for int64 sources).
  • Default values are normalized: MySQL's CURRENT_TIMESTAMP becomes NOW(), bare string literals are quoted.

Known Limitations

  • VARCHAR length is lost through the Arrow intermediate (varchar(255) becomes Arrow string becomes text).
  • Numeric precision is not fully preserved (DECIMAL(10,2) becomes decimal128(38,10) then numeric(38,10)).
  • Array types are stored as Arrow string (not native Arrow arrays).

Example BDL

<?xml version="1.0" encoding="UTF-8"?>
<bani schemaVersion="1.0">
  <project name="mysql-to-pg" />
  <source connector="mysql">
    <connection host="localhost" port="3306"
                database="erp"
                username="${env:MYSQL_USER}"
                password="${env:MYSQL_PASS}" />
  </source>
  <target connector="postgresql">
    <connection host="localhost" port="5432"
                database="analytics"
                username="${env:PG_USER}"
                password="${env:PG_PASS}" />
  </target>
  <options>
    <batchSize>100000</batchSize>
    <parallelWorkers>4</parallelWorkers>
  </options>
</bani>