LLM Orchestrator

The unified LLM layer for Sentinel. Route across providers, enforce budgets, mask PII, and cache responses.

Supported Providers

Provider	Models	Region
AWS Bedrock	Claude 3.5, Claude 3, Nova Pro	ap-south-1
OpenAI	GPT-4o, GPT-4 Turbo, GPT-3.5	us-east-1
Qwen	Qwen2.5-72B	ap-south-1
Kimi	Kimi k2.5	ap-south-1 (via Bedrock)

Fallback Chains

Primary: Claude 3.5 Sonnet (Bedrock)
    │──► Timeout? ──► GPT-4o (OpenAI)
    │──► Rate limit? ──► Nova Pro (Bedrock)
    │──► Error? ──► GPT-3.5 (OpenAI)
    └──► All fail? ──► Cached response or graceful degradation

Automatic retry with exponential backoff between providers.

PII Masking

Before LLM call:

Aadhaar → [AADHAAR]
PAN → [PAN]
UPI → [UPI]
IFSC → [IFSC]
Mobile → [PHONE]

After response: unmask back to original values.

Dual-Layer Caching

Layer	Key	TTL	Hit Rate
Redis	Exact prompt hash	1 hour	~15%
Titan	Semantic embedding (cosine ≥ 0.95)	24 hours	~25%

Combined cache hit rate: ~35–40%.

Cost Tracking

Per-tenant daily and monthly tracking:

Metric	Storage
Input tokens	PostgreSQL
Output tokens	PostgreSQL
Cost (INR)	PostgreSQL
Model usage	PostgreSQL
Cache hits	PostgreSQL

Budget enforcement: Hard stop at 120% of monthly allocation.

Protocols

REST: Full feature parity
gRPC: Streaming, batch, and low-latency use cases
WebSocket: Real-time chat streaming
SSE: Server-Sent Events for frontend

LLM Orchestrator

LLM Orchestrator

Supported Providers

Fallback Chains

PII Masking

Dual-Layer Caching

Cost Tracking

Protocols

Related