Developers
MongoDB
Document and metadata store for extractions, chat sessions, and agent configurations.
MongoDB (DocumentDB)
Document store for unstructured and semi-structured data.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Extraction │ │ Chat API │ │ Agent Engine │ │
│ │ Service │ │ │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
│ readPreference│ primaryPreferred│ secondaryPreferred │
│ (secondary) │ │ │
▼ ▼ ▼ │
┌─────────────────────────────────────────────────────────────────────┐
│ Amazon DocumentDB 5.0 — 3-Node Replica Set │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Primary │◄──►│ Secondary 1 │◄──►│ Secondary 2 │ │
│ │ (writes) │ │ (reads) │ │ (reads) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ▲ │
│ Daily Snapshots ──► S3 (cross-region copy) │
└─────────────────────────────────────────────────────────────────────┘
Use Cases
| Collection | Purpose | Example Document |
|---|---|---|
documents |
Extraction results | CAS holdings, metadata |
chat_sessions |
Message history | Messages, timestamps, token usage |
agents |
Agent configurations | System prompts, tool lists, guardrails |
workflows |
Workflow definitions | DAG nodes, edges, parameters |
users |
User profiles | Email, role, tenant, preferences |
Configuration
- Engine: Amazon DocumentDB 5.0 (MongoDB-compatible)
- Instance: db.r6g.xlarge (prod), db.t3.medium (dev)
- Replication: 3-node cluster across AZs
- Backup: Daily snapshots, 7-day retention
- Encryption: AES-256 at rest, TLS in transit
Connection Patterns & Client Libraries
Primary Driver: pymongo — version >=4.6.0
from pymongo import MongoClient, ReadPreference
client = MongoClient(
"mongodb://user:pass@host1:27017,host2:27017,host3:27017/?"
"replicaSet=rs0&readPreference=secondaryPreferred&tls=true",
maxPoolSize=100,
minPoolSize=10,
maxIdleTimeMS=60000,
serverSelectionTimeoutMS=5000
)
Read Preferences by Service:
| Service | Read Preference | Reason |
|---|---|---|
| Extraction Service | secondaryPreferred |
Heavy read load, eventual consistency acceptable |
| Chat API | primaryPreferred |
Needs recent writes for session continuity |
| Agent Engine | secondaryPreferred |
Config reads are mostly immutable |
Connection Pooling:
maxPoolSize=100per application instance.- Monitor
connections.currentvs.connections.availableto avoid exhaustion.
Backup & Disaster Recovery
| Strategy | Frequency | Retention | RPO | RTO |
|---|---|---|---|---|
| Automated Snapshots | Daily | 7 days | 24 hours | < 30 minutes |
| Manual Snapshots | Pre-release | Indefinite | Zero | < 30 minutes |
| Cross-Region Snapshot Copy | Daily (prod) | 7 days | 24 hours | < 1 hour |
| Point-in-Time Recovery (PITR) | Continuous | 7 days | 5 minutes | < 30 minutes |
DR Runbook:
- Identify latest valid snapshot in DR region.
- Restore to a new DocumentDB cluster:
aws docdb restore-db-cluster-to-point-in-time. - Update application connection strings via Parameter Store / Secrets Manager.
- Verify replica set health with
rs.status()before redirecting traffic.
Performance Tuning
Indexing Strategy:
| Collection | Index | Type | Notes |
|---|---|---|---|
documents |
{ tenant_id: 1, created_at: -1 } |
Compound | Supports tenant-scoped queries |
chat_sessions |
{ session_id: 1 } |
Unique | Fast session lookups |
chat_sessions |
{ user_id: 1, updated_at: -1 } |
Compound | Recent chat list |
agents |
{ agent_id: 1 } |
Unique | Primary key |
workflows |
{ workflow_id: 1, version: -1 } |
Compound | Versioned workflow fetch |
TTL Indexes:
chat_sessionsmessages: expire after 90 days (expireAfterSeconds: 7776000).audit_eventsstaging: expire after 30 days.
Query Profiling:
- Enable profiler threshold at
100 msin dev/staging to catch unindexed queries. - Review slow logs weekly via CloudWatch Logs Insights.
Monitoring & Alerting
CloudWatch Metrics:
| Metric | Threshold | Severity |
|---|---|---|
CPUUtilization |
> 80% for 5 min | Warning |
CPUUtilization |
> 95% for 2 min | Critical |
FreeableMemory |
< 10% of total | Critical |
DatabaseConnections |
> 400 (prod) | Warning |
WriteLatency |
p99 > 20 ms | Warning |
ReadLatency |
p99 > 10 ms | Warning |
VolumeBytesUsed |
> 80% of provisioned | Warning |
Log Exports:
- Audit logs → CloudWatch Logs → OpenSearch for SIEM correlation.
- Profiler logs → weekly Athena query for index recommendations.