OpenSearch

Hybrid search engine combining BM25 keyword search with KNN semantic search.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        Application Layer                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │   RAG API    │  │  Ingestion   │  │   Analytics  │              │
│  │  (hybrid)    │  │   Pipeline   │  │   Dashboard  │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
          │ search / msearch  │ bulk index      │ aggregation queries
          │ (opensearch-py)   │ (opensearch-py) │ (opensearch-py)
          ▼                   ▼                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│              Amazon OpenSearch 2.11 — Managed Cluster                 │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │ Master Node │    │   Data Node 1   │    │   Data Node 2   │     │
│  │ (dedicated) │    │   (r6g.2xlarge) │    │   (r6g.2xlarge) │     │
│  └─────────────┘    └─────────────────┘    └─────────────────┘     │
│                              ▲                    ▲                 │
│                              └────────────────────┘                 │
│                                   Data Node 3                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Indexes: documents │ sessions │ sops │ audit-logs-*        │   │
│  │  Snapshots ──► S3 (automated daily)                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Cluster Configuration

Version: OpenSearch 2.11
Instance: r6g.2xlarge.search (prod), t3.medium.search (dev)
Nodes: 3 data nodes, 1 master node
Indexes:
- documents — Ingested document chunks
- sessions — Session-scoped chat context
- sops — Standard operating procedures

Index Mapping

{
  "mappings": {
    "properties": {
      "text": { "type": "text", "analyzer": "standard" },
      "embedding": { "type": "knn_vector", "dimension": 384 },
      "metadata": { "type": "object" }
    }
  }
}

Connection Patterns & Client Libraries

Primary Driver: opensearch-py — version >=2.4.0

from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

client = OpenSearch(
    hosts=[{"host": host, "port": 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=30,
    max_retries=3,
    retry_on_timeout=True
)

Authentication:

IAM-based fine-grained access control (FGAC) in production.
Master user / password for dev and local testing.
Requests signed with AWS4Auth using the es service scope.

Search Patterns:

Pattern	Endpoint	Use Case
BM25 Lexical	`/_search` with `match` query	Keyword filtering, exact phrase matching
KNN Semantic	`/_search` with `knn` clause	Semantic similarity, embedding search
Hybrid	`/_search` with `script_score` + `knn`	RAG retrieval — combine BM25 + KNN
Bulk Ingest	`/_bulk`	Batch document indexing from ingestion pipeline
Update by Query	`/_update_by_query`	Mass metadata updates, TTL enforcement

Index Naming:

Time-series indices use rollover aliases: audit-logs-000001, audit-logs-000002.
ILM (Index Lifecycle Management) moves warm indices to UltraWarm after 7 days.

Backup & Disaster Recovery

Strategy	Frequency	Retention	RPO	RTO
Automated Snapshots	Every 6 hours	14 days	6 hours	< 1 hour
Manual Snapshots	Pre-release	Indefinite	Zero	< 1 hour
Cross-Cluster Replication	Near-real-time (prod)	N/A	Minutes	< 30 minutes
Index-Level Exports	Weekly (critical indices)	30 days	N/A	Hours

DR Runbook:

Verify latest automated snapshot in S3 repository: GET _snapshot/{repo}/_all.
For partial failure, restore specific index: POST _snapshot/{repo}/{snap}/_restore with indices filter.
For full cluster DR, restore to a new domain in the DR region, then update Route 53 alias.
Validate cluster health: GET _cluster/health must reach green before traffic shift.

Performance Tuning

Shard Strategy:

Index	Primary Shards	Replicas	Notes
`documents`	3	1	Balanced across 3 data nodes
`sessions`	1	1	Smaller dataset, minimize overhead
`sops`	1	1	Infrequently updated
`audit-logs-*`	1	0	Time-series, replica on next index

KNN Tuning:

Engine: nmslib for approximate nearest neighbors (ANN).
ef_construction: 128 (index-time quality vs. speed tradeoff).
ef_search: 256 (query-time recall boost).
m: 16 (graph connectivity).
Refresh interval: 30s during bulk ingest, 1s during search-heavy workloads.

Caching:

Enable query cache for repeated terms aggregations.
Set indices.requests.cache.size to 5% of heap for filter cache.

JVM & Heap:

Heap capped at 32 GB (compressed OOPs boundary).
G1GC enabled; monitor jvm.gc.old.count for pressure.

Monitoring & Alerting

CloudWatch Metrics:

Metric	Threshold	Severity
`ClusterStatus.red`	> 0	Critical
`ClusterStatus.yellow`	> 0 for 10 min	Warning
`CPUUtilization`	> 80% for 5 min	Warning
`JVMMemoryPressure`	> 85%	Warning
`JVMMemoryPressure`	> 95%	Critical
`SearchLatency`	p99 > 500 ms	Warning
`IndexingLatency`	p99 > 200 ms	Warning
`ShardCount`	> 900 per node	Warning
`MasterReachableFromNode`	< 1 (any node)	Critical

Cluster Health Checks:

curl -s https://$DOMAIN/_cluster/health | jq '.status, .unassigned_shards'

green: all primary and replica shards allocated.
yellow: replica shards unallocated (acceptable during rolling restarts).
red: primary shards unallocated — immediate escalation.

Alert Routing:

Critical alerts → PagerDuty (on-call rotation).
Warning alerts → Slack #sentinel-alerts.

OpenSearch

OpenSearch

Architecture

Cluster Configuration

Index Mapping

Connection Patterns & Client Libraries

Backup & Disaster Recovery

Performance Tuning

Monitoring & Alerting

Related