Developers
Knowledge Base
Document ingestion, hybrid search, embeddings, chunking, and team-based access control.
Knowledge Base
The semantic layer behind Sentinel’s RAG. Ingests documents, generates embeddings, and enables hybrid search across text and image.
Ingestion Pipeline
Document (PDF/DOCX/TXT)
│
▼
Parser ──► Text extraction
│
▼
Chunker ──► Token-based segmentation with overlap
│
▼
Embedding Service
├──► Text: all-MiniLM-L6-v2 (384-dim)
└──► Image: OpenCLIP ViT-B-32 (512-dim)
│
▼
OpenSearch Index
├──► BM25 (lexical)
└──► KNN (semantic)
Hybrid Search
Combines BM25 keyword search with KNN semantic search using Reciprocal Rank Fusion:
score = 1 / (k + rank_bm25) + 1 / (k + rank_knn)
Default k=60. Results re-ranked by fusion score.
Smart Chunking
- Token-based: 512 tokens per chunk, 64-token overlap
- Boundary-aware: Respects paragraph and sentence boundaries
- Q&A extraction: Automatically detects FAQ pairs for separate indexing
Team Hierarchy & ACL
MongoDB-backed organizational structure:
Organization
├── Team A
│ ├── User 1 (read/write)
│ └── User 2 (read-only)
├── Team B
│ └── User 3 (read/write)
└── Shared Documents (all teams)
Search results are filtered by team membership.
Session-Scoped Indexing
Documents uploaded to a chat session are:
- Indexed to a session-specific OpenSearch alias
- Visible only within that session
- Auto-deleted after 90 days