Live App →

LLM Orchestrator

The unified LLM layer for Sentinel. Route across providers, enforce budgets, mask PII, and cache responses.


Supported Providers

Provider Models Region
AWS Bedrock Claude 3.5, Claude 3, Nova Pro ap-south-1
OpenAI GPT-4o, GPT-4 Turbo, GPT-3.5 us-east-1
Qwen Qwen2.5-72B ap-south-1
Kimi Kimi k2.5 ap-south-1 (via Bedrock)

Fallback Chains

Primary: Claude 3.5 Sonnet (Bedrock)
    │──► Timeout? ──► GPT-4o (OpenAI)
    │──► Rate limit? ──► Nova Pro (Bedrock)
    │──► Error? ──► GPT-3.5 (OpenAI)
    └──► All fail? ──► Cached response or graceful degradation

Automatic retry with exponential backoff between providers.


PII Masking

Before LLM call:

  • Aadhaar → [AADHAAR]
  • PAN → [PAN]
  • UPI → [UPI]
  • IFSC → [IFSC]
  • Mobile → [PHONE]

After response: unmask back to original values.


Dual-Layer Caching

Layer Key TTL Hit Rate
Redis Exact prompt hash 1 hour ~15%
Titan Semantic embedding (cosine ≥ 0.95) 24 hours ~25%

Combined cache hit rate: ~35–40%.


Cost Tracking

Per-tenant daily and monthly tracking:

Metric Storage
Input tokens PostgreSQL
Output tokens PostgreSQL
Cost (INR) PostgreSQL
Model usage PostgreSQL
Cache hits PostgreSQL

Budget enforcement: Hard stop at 120% of monthly allocation.


Protocols

  • REST: Full feature parity
  • gRPC: Streaming, batch, and low-latency use cases
  • WebSocket: Real-time chat streaming
  • SSE: Server-Sent Events for frontend