SQLAlchemy and PgBouncer
At work we use SQLAlchemy as our ORM, and we were having issues with some connections getting stuck in the pool. We use PgBouncer as our connection pooler, so I started looking into the issue and I found that there's not a lot of information out there about how to configure SQLAlchemy and PgBouncer together. I found a few blog posts and StackOverflow answers, but they were all incomplete or outdated. So I decided to write this post to document what I've learned.
The problem
By default SQLAlchemy uses a connection pool, which is a good thing. It means that you don't have to open a new connection to the database every time you need to execute a query. Instead, you can reuse an existing connection from the pool. This is much faster and more efficient than opening a new connection every time. This is fine and dandy if you are not using PgBouncer, but if you are, then you need to be careful about how you configure SQLAlchemy.
The problem is that if you use a connection pool with PgBouncer (another connection pooler), then you can end up with
connections that are stuck in a funny state in SA pool and the next time you try to use them you'll get an error like
sqlalchemy.exc.ResourceClosedError: This Connection is closed
or
psycopg2.OperationalError) server closed the connection unexpectedly
. This is due to SQLAlchemy's default settings
which make you use QueuePool as the
default pool class. These settings are not compatible with PgBouncer, so you need to use a different pool class or tweak
the settings.
SQLAlchemy configuration
The first thing I would recommend is to set application_name
in your SQLAlchemy engine. This will make it easier to
debug issues in PgBouncer, as you'll be able to see which application is using which connection. You can do this by
adding application_name
to your connection string:
from sqlalchemy import create_engine
engine = create_engine("postgresql://user:pass@host:port/db?application_name=your-app-name")
Connection pooling
As you are using an external connection pooler (PgBouncer), you need to ensure that your SQLAlchemy's connection pool
settings are compatible with it. The default settings are not, so you need to tweak them. The main setting you need to
change is the pool class. The default pool class is QueuePool
, which is not compatible with PgBouncer. You need to
pick a different pool class, and you also need to tweak some of the settings depending on your workload.
The optimal solution depends on your workload and the version of SQLAlchemy you are using. The simplest and safest approach is to disable connection pooling in SQLAlchemy altogether. For that you just need to configure NullPool as your pool class. This will disable the connection pool and open a new connection every time you need to execute a query. This is the safest option and easiest option, although slightly less performant than other alternatives:
from sqlalchemy import create_engine, pool
engine = create_engine("postgresql://user:pass@host:port/db", poolclass=pool.NullPool)
Depending on your workload you have more options though.
Web applications
For web applications just use NullPool, this is the safest option and there is no performance hit as your web server opens a connection once and then reuses it for the lifetime of the process. If you're using Flask-SQLAlchemy you can configure it like this:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config.update({
"SQLALCHEMY_DATABASE_URI": "postgresql://user:pass@host:port/db",
"SQLALCHEMY_ENGINE_OPTIONS": {"poolclass": "sqlalchemy.pool.NullPool"},
})
db = SQLAlchemy(app)
Long-running processes
If you have workers/background jobs using PgBouncer then you definitely want to use NullPool in your SQLAlchemy engine. This is the safest option and there is no performance hit as your worker opens a connection once and then reuses it for the lifetime of the process.
If you're using SQLAlchemy 1.4+ the
new autobegin behavior
is exactly what you need. If you are using SQLAlchemy 1.3 or earlier, I recommend you set autocommit=True
in your
engine. This will ensure that the connection is always in autocommit mode, avoiding leaving transactions stuck open.
The downside of using autocommit=True
is that you need to explicitly start a transaction if you want to use one. This
is what you want anyway, the 1.3 behavior of starting a transaction automatically under the hood is a bad idea that
doesn't work well with connection pooling.
PgBouncer configuration
The PgBouncer configuration will again depend on your workload, but most likely you want to set pool_mode
to
transaction
. If you're using QueuePool
make sure that pool_recycle
is set to a value lower than the PgBouncer
server_lifetime
setting. This will ensure that the connections are recycled before PgBouncer closes them.