Live App →

PostgreSQL

Relational database for structured data requiring ACID guarantees.


Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        Application Layer                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │  Auth API    │  │  Audit Log   │  │ Cost Tracker │              │
│  │              │  │   Ingestor   │  │              │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
          │                 │                 │
          ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      PgBouncer (Connection Pool)                      │
│                  max_client_conn=1000, pool_mode=transaction          │
└─────────────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────────┐
│           Amazon RDS PostgreSQL 15 — Multi-AZ (prod)                │
│  ┌─────────────────┐              ┌─────────────────┐               │
│  │    Primary      │◄────────────►│    Standby      │               │
│  │   (us-east-1a)  │   Sync Rep   │   (us-east-1b)  │               │
│  └─────────────────┘              └─────────────────┘               │
│         ▲                                                           │
│         │ Read Replica (us-west-2) ──► Analytics / Reporting        │
│         │                                                           │
│    Automated Backups ──► S3 (cross-region copy)                     │
└─────────────────────────────────────────────────────────────────────┘

Use Cases

Table Purpose
users Auth credentials, roles, MFA status
sessions JWT token metadata, expiry
audit_logs Immutable security events
invocations LLM call audit trail
costs Per-tenant daily/monthly spend

Configuration

  • Engine: Amazon RDS PostgreSQL 15
  • Instance: db.r6g.large (prod), db.t3.medium (dev)
  • Multi-AZ: Enabled (prod)
  • Backup: Daily, 7-day retention
  • Encryption: AES-256 at rest, TLS in transit

Connection Patterns & Client Libraries

Primary Driver: psycopg2-binary (low-level) / SQLAlchemy (ORM) — versions >=2.9.0 / >=2.0.0

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

engine = create_engine(
    "postgresql+psycopg2://user:pass@pgbouncer:6432/sentinel",
    poolclass=QueuePool,
    pool_size=20,
    max_overflow=10,
    pool_pre_ping=True,
    pool_recycle=3600
)

PgBouncer Configuration (prod):

  • pool_mode = transaction — connection returned to pool after each transaction.
  • max_client_conn = 1000
  • default_pool_size = 20 per user/database pair.
  • Listen port: 6432 (separate from direct RDS port 5432 for admin access).

Connection Best Practices:

  • Always use pool_pre_ping=True to avoid stale connections after failover.
  • Set application_name in connection string for per-service query tracking in pg_stat_activity.
  • Use read replicas for heavy reporting queries to offload the primary.

Backup & Disaster Recovery

Strategy Frequency Retention RPO RTO
Automated Backups Daily + transaction logs 7 days 5 minutes < 30 minutes
Manual DB Snapshots Pre-release Indefinite Zero < 30 minutes
Cross-Region Read Replica Async replication N/A < 1 minute < 15 minutes (promote)
Logical Exports (pg_dump) Weekly (schema-only) 30 days N/A Hours

DR Runbook:

  1. Initiate failover: RDS automatically promotes standby in Multi-AZ; verify with aws rds describe-db-instances.
  2. For regional DR, promote cross-region read replica: aws rds promote-read-replica.
  3. Update PgBouncer DNS target via Route 53 failover routing.
  4. Verify data consistency with pg_dump --schema-only checksum comparison.

Performance Tuning

Parameter Group Settings (prod):

Parameter Value Rationale
shared_buffers 25% of RAM Cache hot tables and indexes
effective_cache_size 75% of RAM Optimizer hint for index vs. seq scan
work_mem 256 MB Sort/hash operations per connection
maintenance_work_mem 1 GB VACUUM, CREATE INDEX speed
max_connections 500 Upper bound before PgBouncer saturation
log_min_duration_statement 1000 ms Capture slow queries for tuning

Index Maintenance:

  • Run VACUUM ANALYZE nightly via RDS maintenance window.
  • Monitor index bloat with pgstattuple extension; rebuild when bloat > 30%.
  • Partition audit_logs and invocations by month (native partitioning, PostgreSQL 15).

Query Optimization:

  • Use EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for execution plan review.
  • Add partial indexes for soft-deleted rows: WHERE deleted_at IS NULL.

Monitoring & Alerting

CloudWatch Metrics:

Metric Threshold Severity
CPUUtilization > 80% for 5 min Warning
CPUUtilization > 95% for 2 min Critical
FreeableMemory < 1 GB Critical
DatabaseConnections > 400 Warning
ReadLatency p99 > 20 ms Warning
WriteLatency p99 > 20 ms Warning
ReplicaLag > 30 seconds Warning
TransactionLogsDiskUsage > 50 GB Warning

Performance Insights:

  • Enabled on prod; review top SQL by wait state weekly.
  • Focus on IO:DataFileRead and Lock:Relation wait events.

Log Exports:

  • PostgreSQL logs → CloudWatch Logs → OpenSearch for full-text search.
  • Slow query logs → weekly tuning report via Athena.