Live App →

OpenSearch

Hybrid search engine combining BM25 keyword search with KNN semantic search.


Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        Application Layer                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │   RAG API    │  │  Ingestion   │  │   Analytics  │              │
│  │  (hybrid)    │  │   Pipeline   │  │   Dashboard  │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
          │ search / msearch  │ bulk index      │ aggregation queries
          │ (opensearch-py)   │ (opensearch-py) │ (opensearch-py)
          ▼                   ▼                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│              Amazon OpenSearch 2.11 — Managed Cluster                 │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │ Master Node │    │   Data Node 1   │    │   Data Node 2   │     │
│  │ (dedicated) │    │   (r6g.2xlarge) │    │   (r6g.2xlarge) │     │
│  └─────────────┘    └─────────────────┘    └─────────────────┘     │
│                              ▲                    ▲                 │
│                              └────────────────────┘                 │
│                                   Data Node 3                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Indexes: documents │ sessions │ sops │ audit-logs-*        │   │
│  │  Snapshots ──► S3 (automated daily)                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Cluster Configuration

  • Version: OpenSearch 2.11
  • Instance: r6g.2xlarge.search (prod), t3.medium.search (dev)
  • Nodes: 3 data nodes, 1 master node
  • Indexes:
    • documents — Ingested document chunks
    • sessions — Session-scoped chat context
    • sops — Standard operating procedures

Index Mapping

{
  "mappings": {
    "properties": {
      "text": { "type": "text", "analyzer": "standard" },
      "embedding": { "type": "knn_vector", "dimension": 384 },
      "metadata": { "type": "object" }
    }
  }
}

Connection Patterns & Client Libraries

Primary Driver: opensearch-py — version >=2.4.0

from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

client = OpenSearch(
    hosts=[{"host": host, "port": 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=30,
    max_retries=3,
    retry_on_timeout=True
)

Authentication:

  • IAM-based fine-grained access control (FGAC) in production.
  • Master user / password for dev and local testing.
  • Requests signed with AWS4Auth using the es service scope.

Search Patterns:

Pattern Endpoint Use Case
BM25 Lexical /_search with match query Keyword filtering, exact phrase matching
KNN Semantic /_search with knn clause Semantic similarity, embedding search
Hybrid /_search with script_score + knn RAG retrieval — combine BM25 + KNN
Bulk Ingest /_bulk Batch document indexing from ingestion pipeline
Update by Query /_update_by_query Mass metadata updates, TTL enforcement

Index Naming:

  • Time-series indices use rollover aliases: audit-logs-000001, audit-logs-000002.
  • ILM (Index Lifecycle Management) moves warm indices to UltraWarm after 7 days.

Backup & Disaster Recovery

Strategy Frequency Retention RPO RTO
Automated Snapshots Every 6 hours 14 days 6 hours < 1 hour
Manual Snapshots Pre-release Indefinite Zero < 1 hour
Cross-Cluster Replication Near-real-time (prod) N/A Minutes < 30 minutes
Index-Level Exports Weekly (critical indices) 30 days N/A Hours

DR Runbook:

  1. Verify latest automated snapshot in S3 repository: GET _snapshot/{repo}/_all.
  2. For partial failure, restore specific index: POST _snapshot/{repo}/{snap}/_restore with indices filter.
  3. For full cluster DR, restore to a new domain in the DR region, then update Route 53 alias.
  4. Validate cluster health: GET _cluster/health must reach green before traffic shift.

Performance Tuning

Shard Strategy:

Index Primary Shards Replicas Notes
documents 3 1 Balanced across 3 data nodes
sessions 1 1 Smaller dataset, minimize overhead
sops 1 1 Infrequently updated
audit-logs-* 1 0 Time-series, replica on next index

KNN Tuning:

  • Engine: nmslib for approximate nearest neighbors (ANN).
  • ef_construction: 128 (index-time quality vs. speed tradeoff).
  • ef_search: 256 (query-time recall boost).
  • m: 16 (graph connectivity).
  • Refresh interval: 30s during bulk ingest, 1s during search-heavy workloads.

Caching:

  • Enable query cache for repeated terms aggregations.
  • Set indices.requests.cache.size to 5% of heap for filter cache.

JVM & Heap:

  • Heap capped at 32 GB (compressed OOPs boundary).
  • G1GC enabled; monitor jvm.gc.old.count for pressure.

Monitoring & Alerting

CloudWatch Metrics:

Metric Threshold Severity
ClusterStatus.red > 0 Critical
ClusterStatus.yellow > 0 for 10 min Warning
CPUUtilization > 80% for 5 min Warning
JVMMemoryPressure > 85% Warning
JVMMemoryPressure > 95% Critical
SearchLatency p99 > 500 ms Warning
IndexingLatency p99 > 200 ms Warning
ShardCount > 900 per node Warning
MasterReachableFromNode < 1 (any node) Critical

Cluster Health Checks:

curl -s https://$DOMAIN/_cluster/health | jq '.status, .unassigned_shards'
  • green: all primary and replica shards allocated.
  • yellow: replica shards unallocated (acceptable during rolling restarts).
  • red: primary shards unallocated — immediate escalation.

Alert Routing:

  • Critical alerts → PagerDuty (on-call rotation).
  • Warning alerts → Slack #sentinel-alerts.