PostgreSQL Connector¶
The PostgreSQL connector supports reading from and writing to PostgreSQL databases using the psycopg 3.x driver.
Supported Versions¶
PostgreSQL 9.6 through 17.
Driver¶
psycopg 3.x (psycopg[binary]>=3.1). Uses the binary build for faster installation.
Connection Configuration¶
<source connector="postgresql">
<connection host="localhost"
port="5432"
database="analytics"
username="${env:PG_USER}"
password="${env:PG_PASS}" />
</source>
| Parameter | Default | Description |
|---|---|---|
host |
Required | Database hostname or IP. |
port |
5432 |
Database port. |
database |
Required | Database name. |
username |
-- | Credential via ${env:VAR}. |
password |
-- | Credential via ${env:VAR}. |
SSL / TLS¶
When encrypt="true" is set on the <connection> element, the connector uses sslmode=prefer (try SSL, fall back to plaintext). When not set, sslmode=disable is used.
TCP Keepalive¶
The connector enables TCP keepalive by default to prevent idle connection timeouts during long target-side writes:
keepalives=1keepalives_idle=30secondskeepalives_interval=10secondskeepalives_count=5
Performance Features¶
COPY Binary Protocol (Writes)¶
The PostgreSQLDataWriter uses the psycopg COPY protocol with write_row() for high-throughput bulk inserts. This is significantly faster than row-by-row INSERT statements.
Streaming Cursor (Reads)¶
Reads use server-side cursors to stream data without loading the entire result set into memory. Data is fetched in batches of batch_size rows and converted to Arrow RecordBatches.
Push-Down CAST¶
For jsonb, json, and uuid columns, the data reader issues ::text casts in the SELECT statement. This pushes string conversion to the database server, eliminating Python-level json.dumps() and str() calls in the hot path.
Vectorized Column Extraction¶
The data writer uses Arrow's to_pylist() for C-level column extraction instead of per-cell scalar.as_py() loops.
Default Schema¶
The default schema is public.
Type Mapping¶
See Type Mappings > PostgreSQL for the complete source and sink mapping tables.
Key points:
serial/bigserialare used for auto-increment columns (bigserial for int64 sources).- Default values are normalized: MySQL's
CURRENT_TIMESTAMPbecomesNOW(), bare string literals are quoted.
Known Limitations¶
- VARCHAR length is lost through the Arrow intermediate (
varchar(255)becomes Arrowstringbecomestext). - Numeric precision is not fully preserved (
DECIMAL(10,2)becomesdecimal128(38,10)thennumeric(38,10)). - Array types are stored as Arrow
string(not native Arrow arrays).
Example BDL¶
<?xml version="1.0" encoding="UTF-8"?>
<bani schemaVersion="1.0">
<project name="mysql-to-pg" />
<source connector="mysql">
<connection host="localhost" port="3306"
database="erp"
username="${env:MYSQL_USER}"
password="${env:MYSQL_PASS}" />
</source>
<target connector="postgresql">
<connection host="localhost" port="5432"
database="analytics"
username="${env:PG_USER}"
password="${env:PG_PASS}" />
</target>
<options>
<batchSize>100000</batchSize>
<parallelWorkers>4</parallelWorkers>
</options>
</bani>