Live App →

Knowledge Base

The semantic layer behind Sentinel’s RAG. Ingests documents, generates embeddings, and enables hybrid search across text and image.


Ingestion Pipeline

Document (PDF/DOCX/TXT)
    │
    ▼
Parser ──► Text extraction
    │
    ▼
Chunker ──► Token-based segmentation with overlap
    │
    ▼
Embedding Service
    ├──► Text: all-MiniLM-L6-v2 (384-dim)
    └──► Image: OpenCLIP ViT-B-32 (512-dim)
    │
    ▼
OpenSearch Index
    ├──► BM25 (lexical)
    └──► KNN (semantic)

Combines BM25 keyword search with KNN semantic search using Reciprocal Rank Fusion:

score = 1 / (k + rank_bm25) + 1 / (k + rank_knn)

Default k=60. Results re-ranked by fusion score.


Smart Chunking

  • Token-based: 512 tokens per chunk, 64-token overlap
  • Boundary-aware: Respects paragraph and sentence boundaries
  • Q&A extraction: Automatically detects FAQ pairs for separate indexing

Team Hierarchy & ACL

MongoDB-backed organizational structure:

Organization
├── Team A
│   ├── User 1 (read/write)
│   └── User 2 (read-only)
├── Team B
│   └── User 3 (read/write)
└── Shared Documents (all teams)

Search results are filtered by team membership.


Session-Scoped Indexing

Documents uploaded to a chat session are:

  • Indexed to a session-specific OpenSearch alias
  • Visible only within that session
  • Auto-deleted after 90 days