Skip to content

SQL Server Connector

The SQL Server (MSSQL) connector supports reading from and writing to Microsoft SQL Server using pyodbc (preferred) with pymssql as a fallback.


Supported Versions

SQL Server 2019 through 2022.

Drivers

pyodbc (pyodbc>=5.0) -- Preferred. Uses ODBC Driver 18 for SQL Server with fast_executemany for high-throughput array parameter binding. 5-10x faster than pymssql for writes.

pymssql (pymssql>=2.2) -- Fallback when the ODBC driver is not available. Uses FreeTDS under the hood. The connector automatically tests pyodbc first and falls back to pymssql.


Connection Configuration

<source connector="mssql">
  <connection host="localhost"
              port="1433"
              database="ERP_Production"
              username="${env:MSSQL_USER}"
              password="${env:MSSQL_PASS}"
              encrypt="true" />
</source>
Parameter Default Description
host Required Database hostname or IP.
port 1433 Database port.
database Required Database name.
username -- Credential via ${env:VAR}.
password -- Credential via ${env:VAR}.
encrypt false Enable TLS encryption.

ODBC Driver Auto-Detection

The connector automatically sets the ODBCSYSINI environment variable to locate Homebrew's odbcinst.ini:

  • Apple Silicon (arm64): /opt/homebrew/etc
  • Intel (x86_64): /usr/local/etc
  • Linux: /etc

Set BANI_DISABLE_PYODBC=1 to force pymssql even when pyodbc is available.


Performance Features

fast_executemany (pyodbc)

When pyodbc is available, the data writer enables fast_executemany=True, which uses ODBC array parameter binding for bulk inserts. This is 5-10x faster than row-by-row inserts.

pymssql Fallback

pymssql uses FreeTDS and inline SQL literal formatting for inserts. The connector caps the connection pool to 1 for pymssql (FreeTDS has known stability issues under concurrent load with "DBPROCESS is dead" errors).

The MSSQL connector declares lower recommended defaults than other connectors:

  • recommended_batch_size = 10,000 (vs 100,000 global default)
  • recommended_parallel_workers = 2 (vs 4 global default)

These prevent source connection timeouts during long target-side writes.


Default Schema

The default schema is dbo.


Type Mapping

See Type Mappings > SQL Server for the complete mapping tables.

Key points:

  • Arrow string maps to NVARCHAR(MAX). When the source data_type carries a length <= 4000, the connector uses NVARCHAR(length) instead.
  • NVARCHAR(MAX) columns that participate in indexes are automatically altered to NVARCHAR(4000) before index creation.
  • Boolean defaults (TRUE/FALSE) are converted to 0/1 for BIT columns.
  • MSSQL treats NULLs as duplicates in unique indexes. The connector adds WHERE col IS NOT NULL filters to unique indexes.

FK Creation Resilience

The MSSQL connector has extensive FK creation error handling:

Error Recovery
1785 (cascade cycles) Retry with ON DELETE NO ACTION ON UPDATE NO ACTION
1776 (missing PK/unique) Create a unique index on referenced columns, then retry
1778 (type mismatch) Align source column types to match referenced PK types, then retry
Connection broken Reconnect and retry (up to 3 consecutive failures before aborting)

Known Limitations

  • pymssql caps the connection pool to 1 (no write parallelism).
  • SET TEXTSIZE 2147483647 is issued on connect to handle large text columns.
  • TDS version 7.3 is used for pymssql (SQL Server 2008+ compatibility).

Example BDL

<?xml version="1.0" encoding="UTF-8"?>
<bani schemaVersion="1.0">
  <project name="mssql-to-pg" />
  <source connector="mssql">
    <connection host="sqlserver.local" port="1433"
                database="ERP"
                username="${env:MSSQL_USER}"
                password="${env:MSSQL_PASS}" />
  </source>
  <target connector="postgresql">
    <connection host="localhost" port="5432"
                database="analytics"
                username="${env:PG_USER}"
                password="${env:PG_PASS}" />
  </target>
  <options>
    <batchSize>10000</batchSize>
    <parallelWorkers>2</parallelWorkers>
  </options>
</bani>