Developers
PostgreSQL
Relational store for auth, RBAC, audit logs, and cost tracking.
PostgreSQL
Relational database for structured data requiring ACID guarantees.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Auth API │ │ Audit Log │ │ Cost Tracker │ │
│ │ │ │ Ingestor │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ PgBouncer (Connection Pool) │
│ max_client_conn=1000, pool_mode=transaction │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Amazon RDS PostgreSQL 15 — Multi-AZ (prod) │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Primary │◄────────────►│ Standby │ │
│ │ (us-east-1a) │ Sync Rep │ (us-east-1b) │ │
│ └─────────────────┘ └─────────────────┘ │
│ ▲ │
│ │ Read Replica (us-west-2) ──► Analytics / Reporting │
│ │ │
│ Automated Backups ──► S3 (cross-region copy) │
└─────────────────────────────────────────────────────────────────────┘
Use Cases
| Table | Purpose |
|---|---|
users |
Auth credentials, roles, MFA status |
sessions |
JWT token metadata, expiry |
audit_logs |
Immutable security events |
invocations |
LLM call audit trail |
costs |
Per-tenant daily/monthly spend |
Configuration
- Engine: Amazon RDS PostgreSQL 15
- Instance: db.r6g.large (prod), db.t3.medium (dev)
- Multi-AZ: Enabled (prod)
- Backup: Daily, 7-day retention
- Encryption: AES-256 at rest, TLS in transit
Connection Patterns & Client Libraries
Primary Driver: psycopg2-binary (low-level) / SQLAlchemy (ORM) — versions >=2.9.0 / >=2.0.0
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
"postgresql+psycopg2://user:pass@pgbouncer:6432/sentinel",
poolclass=QueuePool,
pool_size=20,
max_overflow=10,
pool_pre_ping=True,
pool_recycle=3600
)
PgBouncer Configuration (prod):
pool_mode = transaction— connection returned to pool after each transaction.max_client_conn = 1000default_pool_size = 20per user/database pair.- Listen port:
6432(separate from direct RDS port5432for admin access).
Connection Best Practices:
- Always use
pool_pre_ping=Trueto avoid stale connections after failover. - Set
application_namein connection string for per-service query tracking inpg_stat_activity. - Use read replicas for heavy reporting queries to offload the primary.
Backup & Disaster Recovery
| Strategy | Frequency | Retention | RPO | RTO |
|---|---|---|---|---|
| Automated Backups | Daily + transaction logs | 7 days | 5 minutes | < 30 minutes |
| Manual DB Snapshots | Pre-release | Indefinite | Zero | < 30 minutes |
| Cross-Region Read Replica | Async replication | N/A | < 1 minute | < 15 minutes (promote) |
| Logical Exports (pg_dump) | Weekly (schema-only) | 30 days | N/A | Hours |
DR Runbook:
- Initiate failover: RDS automatically promotes standby in Multi-AZ; verify with
aws rds describe-db-instances. - For regional DR, promote cross-region read replica:
aws rds promote-read-replica. - Update PgBouncer DNS target via Route 53 failover routing.
- Verify data consistency with
pg_dump --schema-onlychecksum comparison.
Performance Tuning
Parameter Group Settings (prod):
| Parameter | Value | Rationale |
|---|---|---|
shared_buffers |
25% of RAM | Cache hot tables and indexes |
effective_cache_size |
75% of RAM | Optimizer hint for index vs. seq scan |
work_mem |
256 MB | Sort/hash operations per connection |
maintenance_work_mem |
1 GB | VACUUM, CREATE INDEX speed |
max_connections |
500 | Upper bound before PgBouncer saturation |
log_min_duration_statement |
1000 ms | Capture slow queries for tuning |
Index Maintenance:
- Run
VACUUM ANALYZEnightly via RDS maintenance window. - Monitor index bloat with
pgstattupleextension; rebuild when bloat > 30%. - Partition
audit_logsandinvocationsby month (native partitioning, PostgreSQL 15).
Query Optimization:
- Use
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)for execution plan review. - Add partial indexes for soft-deleted rows:
WHERE deleted_at IS NULL.
Monitoring & Alerting
CloudWatch Metrics:
| Metric | Threshold | Severity |
|---|---|---|
CPUUtilization |
> 80% for 5 min | Warning |
CPUUtilization |
> 95% for 2 min | Critical |
FreeableMemory |
< 1 GB | Critical |
DatabaseConnections |
> 400 | Warning |
ReadLatency |
p99 > 20 ms | Warning |
WriteLatency |
p99 > 20 ms | Warning |
ReplicaLag |
> 30 seconds | Warning |
TransactionLogsDiskUsage |
> 50 GB | Warning |
Performance Insights:
- Enabled on prod; review top SQL by wait state weekly.
- Focus on
IO:DataFileReadandLock:Relationwait events.
Log Exports:
- PostgreSQL logs → CloudWatch Logs → OpenSearch for full-text search.
- Slow query logs → weekly tuning report via Athena.