migrations.guide
Will my migration cause an application outage?
Unsafe due to default, but alternative is possible.
It is unsafe to add a new column:
ALTER TABLE table ADD COLUMN column INT DEFAULT 0;
All queries of every kind will be blocked until the migration is
complete because an ACCESS EXCLUSIVE
lock is required.
However, if the application is written with this in mind and migration timeouts are used, then it is possible to eventually do this safely.
See the HOW-TO: Obtain a lock safely for a migration section
below for a step-by-step guide on what to do.
Somewhat safe.
It is unsafe to add a new column:
ALTER TABLE table ADD COLUMN column INT;
All queries of every kind will be blocked until the migration is
complete because an ACCESS EXCLUSIVE
lock is required.
However, if the application is written with this in mind and migration timeouts are used, then it is possible to eventually do this safely.
See the HOW-TO: Obtain a lock safely for a migration section below for a step-by-step guide on what to do.
Warning: NULL
columns cannot be made
into NOT NULL
columns safely after-the-fact until
Postgres 12. So if you are actually intending to make a NOT NULL
column, then create a NOT NULL
column with a dummy default value.
This is unsafe.
You're best bet is to compromise and add a NULL column, and a separate check constraint.
Add not null - safe alternative possible.
It is strictly unsafe to add an index the "normal" way:
CREATE INDEX name_idx ON table (column);
CREATE UNIQUE INDEX name_idx ON table (column);
A full table scan is required which will take a long time to complete
for large tables, and, as a SHARE
lock is obtained, all
writes will be blocked until the migration is complete.
Create an index CONCURRENTLY
. This happens in the background
and will not block writes.
SET statement_timeout = 0;
SET lock_timeout = 0;
CREATE INDEX CONCURRENTLY idx_name ON table (column);
INVALID
:
\d table
DROP INDEX name_idx;
(Note: it could be unsuccessful due to deadlocks or a uniqueness
violation.)
The explanations below describe the failure conditions of the unsafe
way of adding indexes. If the safe alternative is followed, then these
problems do not apply.
Incomplete - add constraint.
Incomplete - add foreign key.
Good luck with that mate.
A rename must be treated as adding a column, then backfilling data, then removing a column, which means it has the pitfalls of all three, along with the problem of synchronising writes to the old and new columns.
This remaining part of this section will detail the synchronisation process only. Other sections of this guide (by following the flow chart again) detail how to do each of the other above steps safely.
There are three possible strategies for synchronising writes to both columns:
NOTE: This section is incompletely and does not tell you how to do each strategy yet.
It is unsafe to make a column NOT NULL
:
ALTER TABLE table COLUMN column SET NOT NULL;
All queries of every kind will be blocked until the migration is
complete because an ACCESS EXCLUSIVE
lock is required.
Since a full table scan is required to validate the constraint, this
could take a long time for large tables.
Create a NOT NULL
check constraint instead.
ALTER TABLE table ADD CONSTRAINT constraint CHECK (column IS NOT NULL) NOT VALID;
ALTER TABLE table VALIDATE CONSTRAINT constraint;
In order to do this safely, please see Add Something > Add a Constraint. Sorry to redirect you!
That's it... Until Postgres 12 where this check constraint can be
converted into a proper NOT NULL
constraint.
What are the disadvantages of a
NOT NULL
check constraint
NOT NULL
because
the constraint belongs to the table rather than being an option
on the column.
It is unsafe to make a column NOT NULL
:
ALTER TABLE table COLUMN column SET NOT NULL;
...
The following kinds of data migrations are unsafe:
UPDATE table SET column = value;
UPDATE table SET column = value WHERE condition;
Where many rows are updated, this will take a long time to complete,
and, as a FOR UPDATE
lock is obtained on the selected rows,
writes to those rows will be blocked until the
migration is complete.
Modify the data in batches that take about a second each to execute. This will reduce the amount of time the lock is held.
Execute the script in a persistent shell:
#!/bin/bash
set -e
for i in {0..100}; do
echo "=== === === Batch#{i}"
psql -v ON_ERROR_STOP=1 -v v1="${i}" -f migration.sql
done
migration.sql
file:
SET statement_timeout = 2000;
begin;
\timing on
UPDATE table
SET column = value
WHERE
id >= (:v1 + 0) * 1000
AND id < (:v1 + 1) * 1000
;
commit;
In this example, we use the primary key of the table to chunk the rows
into batches of 1000, and the start and stop range 0..100
was determined manually. Also, the transaction is useless, but it may be
necessary for more complex migrations. Furthermore, we enable timing so
that we have insight on how long each update takes. Finally, it is
trivial to stop/resume by altering the start of the range.