Point-in-Time Recovery (PITR) in PostgreSQL : A Complete Disaster Recovery Guide
Point-in-Time Recovery (PITR) lets PostgreSQL databases be restored to any exact moment, crucial for disaster recovery. This guide shows how to use PITR in PostgreSQL , using a real-world accidental
Introduction
Accidental data loss can severely impact business operations, and relying on traditional daily or weekly backups risks losing hours or even days of critical information. Point-in-Time Recovery (PITR) in PostgreSQL provides a powerful solution by allowing restoration to any precise moment in time, minimizing data loss to just seconds. This article presents a detailed, practical guide to implementing and executing PITR, illustrated with a realistic disaster recovery scenario to help safeguard your data effectively.
Table of Contents
Understanding PITR Architecture
Prerequisites and Setup
The Disaster Scenario
Complete Recovery Process
Advanced Recovery Scenarios
Post-Recovery Best Practices
Conclusion
Understanding PITR Architecture
How PITR Works
PITR combines two essential part:
PG Base Backups: Full database snapshots taken on regular intervals (typically daily)
WAL Archives: Continuous archiving of Write-Ahead Log files containing all database changes
When disaster happens, PostgreSQL can replay WAL files from the base backup point up to any desired timestamp, effectively "reapply" database history to recreate the exact state at that moment.
The Recovery Timeline
Base Backup WAL Archiving Disaster Recovery Point
| | | |
v v v v
04:00:00 ────────────────────────────── 14:35:15 ──── 14:34:00
Yesterday Continuous WAL logs Data Loss Target Time
Prerequisites and Setup
1. Configure WAL Archiving
First, enable continuous archiving in PostgreSQL's configuration:
# PostgreSQL Configuration for WAL and Archiving
# File: /var/lib/postgresql/15/main/postgresql.conf
# Sets the level of Write-Ahead Logging
# 'replica' enables enough WAL data for replication and PITR
wal_level = replica
# Enables the archiving of WAL (Write-Ahead Log) files
# to allow point-in-time recovery
archive_mode = on
# Command executed to archive completed WAL segments
# Copies WAL files to the backup directory if not already present
archive_command = 'test ! -f /backup/wal_archive/%f && cp %p /backup/wal_archive/%f'
# Maximum number of concurrent connections from WAL sender processes
# Used for replication and archiving
max_wal_senders = 3
# Time interval between automatic WAL checkpoints
# Shorter intervals mean more frequent checkpoints
checkpoint_timeout = 5min
# Sets the maximum size of WAL files before a checkpoint is triggered
# Controls WAL file generation and disk usage
max_wal_size = 1GB
2. Prepare Archive Infrastructure
# Create secure archive directory
sudo mkdir -p /backup/wal_archive
sudo chown postgres:postgres /backup/wal_archive
sudo chmod 700 /backup/wal_archive
# Restart PostgreSQL to apply configuration
sudo systemctl restart postgresql
3. Automate Daily Base Backups
The sample script below serves as a reference — you can customize it to fit your needs.
#!/bin/bash
# Daily Pg_basebackup scheduled
BACKUP_DIR="/backup/base/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"
# Create base backup with WAL streaming
pg_basebackup -h localhost -U postgres \
-D "$BACKUP_DIR" \
-Ft -z -P -X stream \
--checkpoint=fast
echo "Backup completed: $BACKUP_DIR"
The Disaster Scenario:
You have been notified by the operations or development team. A team member tries to delete or update outdated order or shipment data but unintentionally runs a harmful query that affects important records needed for operations.
-- Intended command:
DELETE FROM orders WHERE status = 'pending';
-- Actual command executed (missing WHERE clause):
DELETE FROM orders;
-- Result: All 10,000 orders deleted!
Immediate Response Protocol
Document the exact disaster time like what time the delete run on the database.
date
# Thu Jan 01 14:35:15 UTC 2021
Record current WAL position
psql -U postgres -d ecommerce_db -c "SELECT pg_current_wal_lsn();"
The recovery target is set to 14:34:00—one minute before the disaster occurred.
Complete PITR Recovery Process
Step 1: Take the backup of existing Postgres data directory
# Stop PostgreSQL service
sudo systemctl stop postgresql
# Backup corrupted database for analysis
sudo mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main_corrupted_backup
Step 2: Prepare Recovery Directory
# Create new data directory
sudo mkdir -p /var/lib/postgresql/15/main
sudo chown postgres:postgres /var/lib/postgresql/15/main
sudo chmod 700 /var/lib/postgresql/15/main
Step 3: Restore Base Backup
# Navigate to data directory
cd /var/lib/postgresql/15/main
# Extract yesterday's base backup
sudo -u postgres tar -xzf /backup/base/1910-01-01/base.tar.gz
# Extract WAL files
sudo -u postgres mkdir -p pg_wal
sudo -u postgres tar -xzf /backup/base/1910-01-01/pg_wal.tar.gz -C pg_wal/
# Ensure proper ownership
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
Step 4: Configure Point-in-Time Recovery
PostgreSQL 15 uses a dual-configuration approach for recovery:
# Create recovery signal file
sudo -u postgres touch /var/lib/postgresql/15/main/recovery.signal
# Add recovery configuration to postgresql.conf
sudo tee -a /var/lib/postgresql/15/main/postgresql.conf << EOF
# PITR Recovery Configuration
restore_command = 'cp /backup/wal_archive/%f %p'
recovery_target_time = '2021-01-01 14:34:00'
recovery_target_action = 'promote'
recovery_target_inclusive = false
EOF
The recovery.signal file tells PostgreSQL to start recovery on next restart.
restore_command defines how to restore archived WAL segments.
recovery_target_time specifies the exact time to recover to.
recovery_target_action specifies what to do upon reaching the target time (e.g., promote to end recovery and resume normal operations). # Other options: pause or shutdown
recovery_target_inclusive = false means recovery stops just before the specified time. # Other options: pause or shutdown
Step 5: Execute Recovery
# Start PostgreSQL in recovery mode
sudo systemctl start postgresql
# Monitor recovery progress
sudo tail -f /var/log/postgresql/postgresql-15-main.log
Recovery Log Output:
LOG: starting point-in-time recovery to 2021-01-01 14:34:00+00
LOG: restored log file "000000010000000000000042" from archive
LOG: redo starts at 0/42000028
LOG: recovery stopping before commit of transaction 12345, time 2021-01-01 14:34:00
LOG: selected new timeline ID: 2
LOG: archive recovery complete
Step 6: Validate Recovery Success
-- Connect to recovered database
psql -U postgres -d ecommerce_db
-- Check current timeline
SELECT pg_control_checkpoint();
-- Verify database is accessible
SELECT current_timestamp, version();
-- Check if recovery is complete
SELECT pg_is_in_recovery();
-- Should return 'f' (false) if promoted successfully
-- Verify data restoration
SELECT COUNT(*) FROM orders;
-- count
-- -------
-- 10000 -- Success! All orders restored
-- Confirm latest order timestamp is before disaster
SELECT MAX(created_at) FROM orders;
-- max
-- --------------------
-- 2021-01-01 14:33:45 -- Perfect! Before the disaster
Advanced PITR Recovery Scenarios:
Scenario 1: Transaction-Specific Recovery
When you know the exact problematic transaction:
# Recovery to specific transaction ID
recovery_target_xid = '12345'
recovery_target_action = 'promote'
Scenario 2: Cautious Recovery with Manual Verification
For critical systems requiring verification before promotion:
# Pause recovery for manual inspection
recovery_target_time = '2021-01-01 14:34:00'
recovery_target_action = 'pause'
-- Verify data integrity while in read-only mode
SELECT COUNT(*) FROM orders WHERE created_at > '2021-01-01 14:34:00';
-- Manually promote to production when satisfied
SELECT pg_promote();
Scenario 3: Latest Point Recovery
For maximum data retention when the exact disaster time is unknown:
# Recover to latest available WAL
restore_command = 'cp /backup/wal_archive/%f %p'
recovery_target_action = 'promote'
# No recovery_target_time specified
Post-Recovery Best Practices Steps:
1. Configuration Cleanup
# Remove recovery-specific settings
sudo sed -i 's/^restore_command/#restore_command/' /var/lib/postgresql/15/main/postgresql.conf
sudo sed -i 's/^recovery_target/#recovery_target/' /var/lib/postgresql/15/main/postgresql.conf
# Reload configuration
sudo systemctl reload postgresql
2. Database Maintenance
-- Update table statistics after recovery
ANALYZE;
-- Perform maintenance vacuum
VACUUM ANALYZE orders;
3. Comprehensive Validation
-- Verify timeline integrity
SELECT timeline_id, reason FROM pg_control_checkpoint();
-- Check for data consistency
SELECT COUNT(*) FROM order_items oi
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.id = oi.order_id);
-- Should return 0 (no orphaned records)
-- Test application functionality
INSERT INTO orders (customer_id, amount, status)
VALUES (1, 100.00, 'pending');
Troubleshooting Common Issues:
Missing WAL Files
# Error: could not open file "pg_wal/00000001001000000000012"
# Solution: Check WAL archive completeness
ls -la /backup/wal_archive/ | grep 00000001001000000000012
# If missing, recovery will stop at the gap
# Consider partial recovery or use available WAL files
Permission Issues
# Error: could not create recovery.signal
# Solution: Verify PostgreSQL user ownership
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
sudo chmod 700 /var/lib/postgresql/15/main
Timeline Conflicts
# Error: timeline 2 of the primary does not match recovery target timeline 1
# Solution: This indicates successful PITR completion
# The database is now on a new timeline branch
Why This Happens:
Original Timeline (1): Your database was running normally
Recovery Process: You restored to a specific point in time
New Timeline (2): PostgreSQL creates a new timeline branch to prevent WAL conflicts
Conclusion
Point-in-Time Recovery represents a fundamental shift from traditional backup strategies, offering organizations the ability to recover from data disasters with surgical precision. The implementation demonstrated in this article—recovering from accidental deletion of orders with only one minute of data loss—illustrates the powerful protection PITR provides.
Key Takeaways
Proactive Planning: PITR requires advance setup with WAL archiving and regular base backups
Rapid Response: Quick identification and isolation of the disaster minimizes data loss
Precise Recovery: Target-specific recovery points enable optimal data retention
Comprehensive Testing: Regular disaster recovery drills ensure system reliability
Remember: disasters are not a matter of "if" but "when." Organizations that implement robust PITR strategies today will be prepared to respond effectively when faced with data loss emergencies tomorrow.