Skip to main content

Zero-Downtime Database Migrations in Production

00:04:09:00

Downtime Costs Money — Literally

When your platform handles live trades and deposits 24/7, "we'll run the migration during a maintenance window" isn't always an option. Markets don't sleep, and clients in different timezones are always active.

Zero-downtime migrations are achievable with discipline and the right patterns. Here's how we do it across our fintech platforms.

The Expand-Contract Pattern

Every breaking schema change follows three phases:

Phase 1: Expand

Add the new column/table alongside the old one. Both work simultaneously:

php
// Migration: Add new column, keep old one
Schema::table('clients', function (Blueprint $table) {
    $table->string('full_name')->nullable()->after('last_name');
});

Phase 2: Migrate Data

Backfill the new column from existing data:

php
// Run as a queued job, not in the migration
class BackfillClientFullName implements ShouldQueue
{
    public function handle(): void
    {
        Client::query()
            ->whereNull('full_name')
            ->chunkById(1000, function ($clients) {
                foreach ($clients as $client) {
                    $client->update([
                        'full_name' => trim("{$client->first_name} {$client->last_name}"),
                    ]);
                }
            });
    }
}

Phase 3: Contract

Once all code reads from the new column and the backfill is complete, remove the old columns in a future release:

php
// Only after confirming no code references first_name/last_name
Schema::table('clients', function (Blueprint $table) {
    $table->dropColumn(['first_name', 'last_name']);
});

The key: each phase is a separate deployment. Never combine expand and contract in the same release.

Backward-Compatible Changes

Some migrations are inherently safe:

Safe (no downtime risk):

  • Adding a nullable column
  • Adding a new table
  • Adding an index (with CONCURRENTLY in PostgreSQL)
  • Increasing a column's length

Dangerous (requires expand-contract):

  • Renaming a column
  • Changing a column's type
  • Removing a column
  • Adding a NOT NULL constraint to existing column
php
// Safe: Adding an index concurrently in PostgreSQL
class AddIndexToTradesTable extends Migration
{
    public function up(): void
    {
        // Standard Laravel migration locks the table
        // Instead, use raw SQL for concurrent index creation
        DB::statement(
            'CREATE INDEX CONCURRENTLY idx_trades_client_symbol
             ON trades (client_id, symbol)'
        );
    }
}

Large Table Migrations

Adding a column to a table with 50 million rows can lock the table for minutes. Strategies:

PostgreSQL: Nearly Instant

PostgreSQL handles ALTER TABLE ADD COLUMN with a default value without rewriting the table (since PG 11):

sql
-- This is instant, even on a 50M row table
ALTER TABLE trades ADD COLUMN commission_paid BOOLEAN DEFAULT false;

Backfilling in Batches

Never backfill in a single transaction:

php
class BackfillCommissionPaid implements ShouldQueue
{
    public int $timeout = 3600;

    public function handle(): void
    {
        $processed = 0;

        Trade::query()
            ->where('commission_paid', false)
            ->where('closed_at', '!=', null)
            ->chunkById(5000, function ($trades) use (&$processed) {
                $ids = $trades->pluck('id');

                Trade::whereIn('id', $ids)->update([
                    'commission_paid' => true,
                ]);

                $processed += $ids->count();

                // Log progress
                Log::info("Backfill progress: {$processed} trades updated");

                // Throttle to avoid overwhelming the database
                usleep(100000); // 100ms pause between batches
            });
    }
}

The usleep between batches prevents the backfill from consuming all database CPU and affecting live queries.

Foreign Key Considerations

Adding foreign keys to large tables can be expensive. Use a two-step approach:

php
// Step 1: Add the constraint as NOT VALID (instant)
DB::statement('
    ALTER TABLE commissions
    ADD CONSTRAINT fk_commissions_trade_id
    FOREIGN KEY (trade_id) REFERENCES trades(id)
    NOT VALID
');

// Step 2: Validate in the background (doesn't lock)
DB::statement('
    ALTER TABLE commissions
    VALIDATE CONSTRAINT fk_commissions_trade_id
');

NOT VALID adds the constraint for new rows immediately. VALIDATE checks existing rows without locking the table.

Testing Migrations

Every migration runs in CI against a copy of the production schema:

yaml
# GitLab CI
migration-test:
  stage: test
  script:
    # Restore latest production schema snapshot
    - pg_restore --schema-only -d test_db latest_schema.dump
    # Run pending migrations
    - php artisan migrate --database=testing
    # Run rollback
    - php artisan migrate:rollback --database=testing
    # Verify schema matches expected state
    - php artisan schema:verify

This catches issues before they reach production:

  • Syntax errors in migration SQL
  • Conflicts with existing columns or constraints
  • Migrations that fail on the actual production schema

Rollback Strategy

Every migration must have a working down() method:

php
class AddCommissionTierToIBs extends Migration
{
    public function up(): void
    {
        Schema::table('ibs', function (Blueprint $table) {
            $table->string('commission_tier')->default('standard')->after('status');
        });
    }

    public function down(): void
    {
        Schema::table('ibs', function (Blueprint $table) {
            $table->dropColumn('commission_tier');
        });
    }
}

For expand-contract migrations, rollback means the code continues using the old column — the new column simply sits unused until you clean it up.

Key Takeaways

  1. Expand-contract pattern for every breaking change — expand, backfill, contract in separate deployments
  2. Know which operations are safe — adding nullable columns and tables is fine; renaming and removing requires care
  3. Backfill in batches with throttling — never one giant UPDATE statement
  4. Use PostgreSQL's concurrent operationsCREATE INDEX CONCURRENTLY, NOT VALID constraints
  5. Test migrations against production schema in CI — catch issues before deployment
  6. Always write rollback methods — you will need them eventually

Zero-downtime migrations require more discipline than "just run it," but in fintech where every minute of downtime costs money and trust, the extra effort pays for itself.

Want to discuss this topic or work together? Get in touch.

Contact me