Developers
OpenSearch
Vector and lexical search backend for hybrid RAG retrieval.
OpenSearch
Hybrid search engine combining BM25 keyword search with KNN semantic search.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ RAG API │ │ Ingestion │ │ Analytics │ │
│ │ (hybrid) │ │ Pipeline │ │ Dashboard │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼──────────────────────┘
│ search / msearch │ bulk index │ aggregation queries
│ (opensearch-py) │ (opensearch-py) │ (opensearch-py)
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ Amazon OpenSearch 2.11 — Managed Cluster │
│ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Master Node │ │ Data Node 1 │ │ Data Node 2 │ │
│ │ (dedicated) │ │ (r6g.2xlarge) │ │ (r6g.2xlarge) │ │
│ └─────────────┘ └─────────────────┘ └─────────────────┘ │
│ ▲ ▲ │
│ └────────────────────┘ │
│ Data Node 3 │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Indexes: documents │ sessions │ sops │ audit-logs-* │ │
│ │ Snapshots ──► S3 (automated daily) │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Cluster Configuration
- Version: OpenSearch 2.11
- Instance: r6g.2xlarge.search (prod), t3.medium.search (dev)
- Nodes: 3 data nodes, 1 master node
- Indexes:
documents— Ingested document chunkssessions— Session-scoped chat contextsops— Standard operating procedures
Index Mapping
{
"mappings": {
"properties": {
"text": { "type": "text", "analyzer": "standard" },
"embedding": { "type": "knn_vector", "dimension": 384 },
"metadata": { "type": "object" }
}
}
}
Connection Patterns & Client Libraries
Primary Driver: opensearch-py — version >=2.4.0
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
client = OpenSearch(
hosts=[{"host": host, "port": 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
timeout=30,
max_retries=3,
retry_on_timeout=True
)
Authentication:
- IAM-based fine-grained access control (FGAC) in production.
- Master user / password for dev and local testing.
- Requests signed with
AWS4Authusing theesservice scope.
Search Patterns:
| Pattern | Endpoint | Use Case |
|---|---|---|
| BM25 Lexical | /_search with match query |
Keyword filtering, exact phrase matching |
| KNN Semantic | /_search with knn clause |
Semantic similarity, embedding search |
| Hybrid | /_search with script_score + knn |
RAG retrieval — combine BM25 + KNN |
| Bulk Ingest | /_bulk |
Batch document indexing from ingestion pipeline |
| Update by Query | /_update_by_query |
Mass metadata updates, TTL enforcement |
Index Naming:
- Time-series indices use rollover aliases:
audit-logs-000001,audit-logs-000002. - ILM (Index Lifecycle Management) moves warm indices to UltraWarm after 7 days.
Backup & Disaster Recovery
| Strategy | Frequency | Retention | RPO | RTO |
|---|---|---|---|---|
| Automated Snapshots | Every 6 hours | 14 days | 6 hours | < 1 hour |
| Manual Snapshots | Pre-release | Indefinite | Zero | < 1 hour |
| Cross-Cluster Replication | Near-real-time (prod) | N/A | Minutes | < 30 minutes |
| Index-Level Exports | Weekly (critical indices) | 30 days | N/A | Hours |
DR Runbook:
- Verify latest automated snapshot in S3 repository:
GET _snapshot/{repo}/_all. - For partial failure, restore specific index:
POST _snapshot/{repo}/{snap}/_restorewithindicesfilter. - For full cluster DR, restore to a new domain in the DR region, then update Route 53 alias.
- Validate cluster health:
GET _cluster/healthmust reachgreenbefore traffic shift.
Performance Tuning
Shard Strategy:
| Index | Primary Shards | Replicas | Notes |
|---|---|---|---|
documents |
3 | 1 | Balanced across 3 data nodes |
sessions |
1 | 1 | Smaller dataset, minimize overhead |
sops |
1 | 1 | Infrequently updated |
audit-logs-* |
1 | 0 | Time-series, replica on next index |
KNN Tuning:
- Engine:
nmslibfor approximate nearest neighbors (ANN). ef_construction: 128 (index-time quality vs. speed tradeoff).ef_search: 256 (query-time recall boost).m: 16 (graph connectivity).- Refresh interval:
30sduring bulk ingest,1sduring search-heavy workloads.
Caching:
- Enable query cache for repeated
termsaggregations. - Set
indices.requests.cache.sizeto 5% of heap for filter cache.
JVM & Heap:
- Heap capped at 32 GB (compressed OOPs boundary).
- G1GC enabled; monitor
jvm.gc.old.countfor pressure.
Monitoring & Alerting
CloudWatch Metrics:
| Metric | Threshold | Severity |
|---|---|---|
ClusterStatus.red |
> 0 | Critical |
ClusterStatus.yellow |
> 0 for 10 min | Warning |
CPUUtilization |
> 80% for 5 min | Warning |
JVMMemoryPressure |
> 85% | Warning |
JVMMemoryPressure |
> 95% | Critical |
SearchLatency |
p99 > 500 ms | Warning |
IndexingLatency |
p99 > 200 ms | Warning |
ShardCount |
> 900 per node | Warning |
MasterReachableFromNode |
< 1 (any node) | Critical |
Cluster Health Checks:
curl -s https://$DOMAIN/_cluster/health | jq '.status, .unassigned_shards'
green: all primary and replica shards allocated.yellow: replica shards unallocated (acceptable during rolling restarts).red: primary shards unallocated — immediate escalation.
Alert Routing:
- Critical alerts → PagerDuty (on-call rotation).
- Warning alerts → Slack
#sentinel-alerts.