EducationPrototype

Academic Knowledge RAG — knowledge orchestration for higher education institutions

Secure RAG assistant centralizing regulations, syllabi and procedures to accelerate document retrieval and reduce compliance errors.

Executive summary

A document RAG prototype designed for higher education institutions. The goal: centralise academic reference materials, regulations and procedures in a secure assistant capable of answering with source citation and confidence level. Designed for a pilot deployment on a campus of 5,000 students.

Business problem

Teaching teams spend an average of 40 minutes per query locating a regulation or syllabus across fragmented systems. The risk of non-compliance during accreditation audits is real and costly. No existing system centralises sources with full response traceability.

Solution

Secure RAG system with an automated ingestion pipeline, role-based access control (RBAC), systematic source citation with confidence scores, and a conversational interface tailored to academic profiles. Every answer is traceable and auditable.

Target KPIs

< 8 sec

P95 response time

70%

Document search time reduction

95%

Source citation precision

0

Target compliance incident

Technical architecture

Modular 5-layer pipeline: document ingestion (AWS Lambda + LangChain), vector storage (Supabase pgvector + HNSW), hybrid retrieval (vector + BM25 + Cohere Rerank v3), generation (Claude Sonnet 4.6 via LiteLLM), observability (Langfuse). OIDC auth via Supabase, audit trail persisted in database.

General architecture

draw.ioArchitecture RAG — Vue générale
SOURCES DE DONNÉESDocumentsPDF, Word, HTMLConfluence, NotionBases de donnéesSQL, NoSQLData warehouseAPIs externesREST, GraphQLWebhooks temps réelObject StorageS3, GCS, Azure BlobOn-premise FSStreamsKafka, KinesisTemps réelPIPELINE D'INGESTIONParsingOCR, extractionmétadonnéesChunkingFixe / sémantiqueOverlap, hiérarchiqueEmbeddingtext-embedding-3Cohere, BGE, customIndex + métadonnéesNamespace, tenant_idRefresh quotidien/horaireVECTOR STOREpgvectorSupabase / PostgresQdrantSelf-host / cloudPineconeManaged, enterpriseWeaviateHybrid search natifRETRIEVAL + RERANKINGQuery processingReformulation HyDEMulti-query expansionHybrid searchVecteur + BM25RRF fusionRerankingCohere Rerank v3Cross-encoderTop-KFiltrage scoreMMR / diversityGÉNÉRATION LLMPrompt builderSystem + contextGuard + instructionsLLMClaude / GPT-5.4Mistral / localCitation layerSources, scoresConfiance, audit trailCache sémantiqueRedis / UpstashCoût -68%AuthOIDCRBACAPI + INTERFACEAPI GatewayRate limit, routingBackendNext.js / FastAPIUI clientChat, streamingOBSERVABILITÉLangfuseTraces, evals, coûtArize PhoenixDrift, RAG qualityLÉGENDE — OPTIONS PAR COUCHESources : cloud (S3/GCS/Azure) ou on-premiseIngestion : open source (LangChain, LlamaIndex) ou managed (AWS Bedrock, VertexAI)Vector store : pgvector (Postgres) / Qdrant (perf) / Pinecone (managed) / Weaviate (hybrid)LLM : Claude Sonnet 4.6 (recommandé) / GPT-5.4 / Mistral Small 4 (open source)API : Next.js, FastAPI, LiteLLM gatewayMonitoring : Langfuse (open source) / LangSmith / Arize PhoenixAuth : OIDC (Supabase Auth, Auth0, Keycloak)Cache : Redis / Upstash — économie LLM jusqu'à 68%

Recommended stack

draw.ioArchitecture RAG — Stack concrète recommandée
SOURCESAWS S3PDF, Word, HTMLbucket chiffréSupabase StorageFichiers utilisateursassetsAPIs / CRMSalesforce, NotionRESTKafka / MSKFlux temps réelINGESTION — AWS Lambda + LangChainAWS LambdaParsing + OCRTrigger S3 eventsLangChain splitterSemantic chunking512 tok, overlap 64text-embedding-3-smallOpenAI API1536 dims, batchSQS queueAsync, retryDLQ intégréVECTOR STORE — Supabase pgvector (+ pgvectorscale)Table documentsid · content · embedding(1536) · metadata jsonb · tenant_id · created_atIndex HNSWcosine similarity · ef_construction=128RLS actifRow-level sec.RETRIEVAL + RERANKING — LangChain / LangGraphQuery routerHyDE expansionIntent classificationHybrid searchpgvector + tsvectorRRF (k=60)Cohere Rerank v3Cross-encodertop_k=5, score 0.7+Context windowCompressionLLMLinguaGÉNÉRATION — LiteLLM Gateway + ClaudeLiteLLM gatewayRouting, fallbackBudget par modelClaude Sonnet 4.6Principal — raisonnementtemp=0.1, 1M ctxCitation builderSources + scoresAudit trail SupabaseUpstash RedisCache sémantiqueTTL 24hAPI + INTERFACE — Next.js + Supabase AuthNext.js APIRoute handlers, streamSupabase AuthOIDC, RBAC, JWTUI chatSSE streaming, VercelOBSERVABILITÉLangfuseTraces, latencycoûtLangSmithTraces LangGraphSTACK CHOISIE — JUSTIFICATIONSStockageAWS S3 + Supabase Storage — chiffrement AES-256, versioning, IAM fine-grainedIngestionAWS Lambda (serverless, pay-per-use) + SQS (retry, DLQ) + LangChain splittersEmbeddingOpenAI text-embedding-3-small — 1536 dims, meilleur rapport qualité/coût 2026Vector DBSupabase pgvector + pgvectorscale — HNSW, 471 QPS@50M vectors, 0 infra supplémentaireRerankingCohere Rerank v3 — cross-encoder, amélioration recall +9% vs vector seulLLMClaude Sonnet 4.6 via LiteLLM — fallback GPT-5.4 mini, budget par modèle, 1M ctxMonitoringLangfuse self-hosted (MIT) + LangSmith pour traces LangGraph — 0 vendor lock-in

Competitive advantages

No SaaS product on the market combines confidence-scored citation, audit trail and granular RBAC control adapted to French and Swiss academic compliance requirements. The system is designed for accreditation, not just productivity.

Risks and mitigations

The primary risk is source document quality: poorly structured PDFs degrade retrieval precision. Mitigation: a quality validation pipeline at ingestion. Second risk: user adoption. Mitigation: a simple conversational interface and short onboarding. Third risk: LLM cost at scale. Mitigation: semantic cache and an economic fallback model.

Impact

  • Prototype / evaluation in progress.
  • Detailed impact data available on request.

Prototype / evaluation in progress.

Project scope

Pilot scope: 1 institution, 3 departments, 5,000 source documents. POC duration: 6 weeks. Environment: AWS eu-west-1 + Supabase Europe. Governance: GDPR, EU-hosted data, no personal data ingested.

Hosting and resilience

Deployment: Vercel (frontend) + AWS Lambda (ingestion) + Supabase (DB + auth). Target availability: 99.5% SLA. Recovery: RTO < 1h, RPO < 24h. Semantic Redis cache (TTL 24h) to absorb load spikes.

Role

Architecture design, data ingestion design, RAG engineering, security review

Next steps

Industrialisation of the document pipeline, extended compliance rule coverage, and campus SSO integration.

Tech stack

Claude Sonnet 4.6pgvectorSupabaseLangChainCohere RerankAWS LambdaAWS S3LiteLLMLangfuseNext.jsOIDCRedis

Timeline

1

S1–S2

Ingestion

Document corpus ingestion and indexing

2

S3–S4

Retrieval

Retrieval, reranking and quality evaluation

3

S5

Interface

Interface, OIDC auth and audit trail

4

S6

Pilot

User pilot and KPI measurement