Developers
LLM Orchestrator
Multi-provider routing, fallback chains, PII masking, cost tracking, and caching.
LLM Orchestrator
The unified LLM layer for Sentinel. Route across providers, enforce budgets, mask PII, and cache responses.
Supported Providers
| Provider | Models | Region |
|---|---|---|
| AWS Bedrock | Claude 3.5, Claude 3, Nova Pro | ap-south-1 |
| OpenAI | GPT-4o, GPT-4 Turbo, GPT-3.5 | us-east-1 |
| Qwen | Qwen2.5-72B | ap-south-1 |
| Kimi | Kimi k2.5 | ap-south-1 (via Bedrock) |
Fallback Chains
Primary: Claude 3.5 Sonnet (Bedrock)
│──► Timeout? ──► GPT-4o (OpenAI)
│──► Rate limit? ──► Nova Pro (Bedrock)
│──► Error? ──► GPT-3.5 (OpenAI)
└──► All fail? ──► Cached response or graceful degradation
Automatic retry with exponential backoff between providers.
PII Masking
Before LLM call:
- Aadhaar →
[AADHAAR] - PAN →
[PAN] - UPI →
[UPI] - IFSC →
[IFSC] - Mobile →
[PHONE]
After response: unmask back to original values.
Dual-Layer Caching
| Layer | Key | TTL | Hit Rate |
|---|---|---|---|
| Redis | Exact prompt hash | 1 hour | ~15% |
| Titan | Semantic embedding (cosine ≥ 0.95) | 24 hours | ~25% |
Combined cache hit rate: ~35–40%.
Cost Tracking
Per-tenant daily and monthly tracking:
| Metric | Storage |
|---|---|
| Input tokens | PostgreSQL |
| Output tokens | PostgreSQL |
| Cost (INR) | PostgreSQL |
| Model usage | PostgreSQL |
| Cache hits | PostgreSQL |
Budget enforcement: Hard stop at 120% of monthly allocation.
Protocols
- REST: Full feature parity
- gRPC: Streaming, batch, and low-latency use cases
- WebSocket: Real-time chat streaming
- SSE: Server-Sent Events for frontend