Skip to content

Connectors Overview

Bani ships with 5 built-in connectors, each implementing both source (read) and sink (write) interfaces. Connectors are discovered via Python entry points and can be extended by third-party packages.


Connector Comparison

Connector Supported Versions Python Driver Source Sink Key Performance Feature
PostgreSQL 9.6 -- 17 psycopg 3.x Yes Yes COPY binary protocol for writes
MySQL 5.5 -- 8.4 PyMySQL Yes Yes LOAD DATA LOCAL INFILE with executemany fallback
SQL Server 2019 -- 2022 pyodbc (preferred) / pymssql (fallback) Yes Yes fast_executemany via ODBC array binding
Oracle 11g -- 23c python-oracledb Yes Yes batcherrors for partial-batch writes
SQLite 3.x sqlite3 (stdlib) Yes Yes WAL journal mode, 64MB page cache

Architecture

All connectors implement two abstract base classes from bani.connectors.base:

  • SourceConnector -- connect(), disconnect(), introspect_schema(), read_table(), estimate_row_count()
  • SinkConnector -- connect(), disconnect(), create_table(), write_batch(), create_indexes(), create_foreign_keys(), execute_sql()

Every connector implements both interfaces (source + sink).

Arrow Interchange

Data flows as pyarrow.RecordBatch between source and sink connectors. This is a core architectural invariant:

Source DB  --[read_table()]--> RecordBatch --[write_batch()]--> Target DB

No intermediate Pandas DataFrames, dict-of-lists, or ORM objects are used.

Type Mapping

Each connector has a type mapper with two directions:

  • Source side: Maps native DB types to Arrow types during introspect_schema().
  • Sink side: from_arrow_type() maps Arrow type strings back to native DDL types during create_table().

This gives N mappers (one per connector) instead of N*N cross-database translation tables. See Type Mappings for the complete mapping tables.

Connector Discovery

Connectors are registered via Python entry points in pyproject.toml:

[project.entry-points."bani.connectors"]
mysql = "bani.connectors.mysql:MySQLConnector"
postgresql = "bani.connectors.postgresql:PostgreSQLConnector"
mssql = "bani.connectors.mssql:MSSQLConnector"
oracle = "bani.connectors.oracle:OracleConnector"
sqlite = "bani.connectors.sqlite:SQLiteConnector"

The ConnectorRegistry discovers all registered connectors at runtime. The orchestrator never references a concrete connector class directly.

Connection Pooling

All connectors use a ConnectionPool that creates multiple connections for parallel table transfers. The pool size is controlled by the parallel_workers project option.


Common Connection Config

All connectors accept credentials via environment variable references:

<source connector="postgresql">
  <connection host="localhost"
              port="5432"
              database="mydb"
              username="${env:PG_USER}"
              password="${env:PG_PASS}" />
</source>

The ${env:VAR_NAME} syntax is resolved at runtime by each connector's _resolve_env_var() method.