Document Intelligence
AI-native extraction, classification, and structuring of financial documents for wealth management operations.
Document Intelligence
Scope — End-to-end AI pipeline that ingests unstructured financial PDFs and images, classifies document types, extracts structured securities data, and produces review-ready outputs with full provenance tracking.
Executive Summary
Document Intelligence is the foundational AI capability of the Sentinel platform. It eliminates manual data entry from portfolio statements, CAS reports, AIF statements, KYC forms, and MF order forms. The Nexus backend orchestrates a 12-stage agentic pipeline powered by multiple LLM providers (AWS Bedrock, OpenAI, Kimi) via the LLM Invocation Orchestrator, delivering sub-minute extraction for standard documents and robust handling for multi-entity, multi-page complex statements.
The Problem
Wealth management firms process thousands of financial documents monthly:
- Portfolio statements from 50+ AMCs with non-standard layouts
- AIF (Alternative Investment Fund) statements with complex L1/L2/L3 holdings
- KYC forms and MF order forms with handwritten fields
- CAS (Consolidated Account Statements) spanning years of transactions
Manual extraction is error-prone, slow, and non-scalable. Legacy OCR tools fail on Indian financial document layouts and lack domain-specific entity recognition.
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Sentinel UI │────▶│ Studio Middleware│────▶│ Nexus Backend │
│ (Upload Drop) │ │ (Auth + Proxy) │ │ (12-Stage Pipe) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ Redis │ │ Celery Worker│
│ Queue │ │ (Light/Heavy/XL)
└──────────┘ └──────────────┘
│
┌────────────────────────────┘
▼
┌─────────────────────┐
│ LLM Orchestrator │
│ (Bedrock / OpenAI) │
└─────────────────────┘
Pipeline Stages
| Stage | Agent | Purpose |
|---|---|---|
| 1 | UploadValidator |
File type, size, corruption checks |
| 2 | DocumentLoader |
PDF → image rendering, page splitting |
| 3 | ClassificationAgent |
CAS, AIF, KYC, Order Form, General |
| 4 | LayoutExtractor |
Visual region detection (tables, headers) |
| 5 | DocumentStructureAgent |
Hierarchical section mapping |
| 6 | EntityDetectionAgent |
Client, account, scheme isolation |
| 7 | ParallelExtractionAgent |
Securities, NAVs, dates, amounts |
| 8 | CrossPageStitcher |
Multi-page table reconciliation |
| 9 | ValidationAgent |
Cross-field consistency, sum checks |
| 10 | MetadataExtractionAgent |
Bloomberg-style rich metadata |
| 11 | SummarizerAgent |
Human-readable extraction summary |
| 12 | StorageAgent |
Persist to MongoDB + OpenSearch |
OCR Branch — Handwritten forms route through KimiOCRAgent (Kimi k2.5 via Bedrock) for specialized handwriting recognition.
Personas & Journeys
Wealth Manager / Relationship Manager
- Uploads a client’s CAS PDF via Sentinel drag-and-drop
- Watches real-time progress bar (stage-by-stage updates via polling)
- Reviews extracted holdings table with confidence scores
- Downloads Excel export for client presentation
- Flags uncertain rows for operations review
Operations Analyst
- Monitors pipeline dashboard for failed extractions
- Opens review queue for low-confidence fields (< 0.85)
- Corrects mismatched ISINs or schemes
- Submits review — pipeline re-runs validation stage
- Tracks token cost per job for budget reconciliation
Product Admin
- Configures schema templates for new document types
- Manages cache warming for frequent AMC layouts
- Monitors extraction accuracy trends per document class
- Adjusts confidence thresholds by tenant
Key Features
| Feature | Detail |
|---|---|
| Multi-Entity Extraction | Detects and isolates multiple clients/accounts in a single document |
| Schema-Driven Output | L1 (Transactions), L2 (Holdings), L3 (Underlying), L1+L2 (AIF SoA) |
| Excel Export | Custom-formatted export with preview mode before download |
| Human Review | Confidence-gated review workflow with per-field accept/reject/edit |
| Token Cost Tracking | Per-document INR cost breakdown by model and stage |
| Async Queues | Light (≤10 pp), Heavy (11-100 pp), XL (>100 pp) with Celery + Redis |
| Webhook Callbacks | Notifies Zen chatbot and external systems on completion |
API Surface
All endpoints are proxied through Studio Middleware at /api/v1/nexus/*.
| Method | Endpoint | Purpose |
|---|---|---|
POST |
/api/v1/upload/ |
Sync upload with pre-classification |
POST |
/api/v2/pipeline/process |
Async pipeline kickoff |
GET |
/api/v2/status/process/{id}/progress |
Real-time stage progress |
GET |
/api/v1/export/{job_id}/excel |
Excel download |
PUT |
/api/v1/review/{review_id} |
Submit human review |
GET |
/api/v1/token-usage/{doc_id} |
Cost and token audit |
Security, Compliance & Operations
- Data Residency — All document binaries stored in tenant-scoped S3 buckets (
ap-south-1) - Encryption — AES-256 at rest (S3 SSE), TLS 1.3 in transit
- PII Handling — Aadhaar, PAN, IFSC, UPI identifiers masked before LLM calls via Orchestrator
- Retention — Configurable per tenant; default 7 years for financial records
- Audit — Every extraction job logged with full provenance (user, timestamp, model versions, confidence scores)
Related Capabilities
- Wealth Understanding — Document Q&A over extracted holdings
- Portfolio Intelligence — Analytics computed from extracted data
- Digital Advisor — Chat interface that triggers document uploads